run_batch_deduplicate#
- metabeeai.process_pdfs.run_batch_deduplicate(base_dir=None, dry_run=False, start_paper=None, end_paper=None, folder_list=None)[source]#
Process all merged_v2.json files in the base directory.
- Parameters:
base_dir (
Path) – Base directory containing paper folders.dry_run (
bool) – If True, only analyze without making changes.start_paper (
int) – First paper number to process (inclusive) - for numeric folders.end_paper (
int) – Last paper number to process (inclusive) - for numeric folders.folder_list (
list) – List of folder names to process (overrides start_paper/end_paper).
- Returns:
Summary of processing results.
- Return type:
Dict[str,Any]