run_batch_deduplicate#
- metabeeai.process_pdfs.run_batch_deduplicate(base_dir=None, dry_run=False, start_paper=None, end_paper=None, folder_list=None)[source]#
Process all merged_v2.json files in the base directory.
- Parameters:
base_dir (Path) – Base directory containing paper folders.
dry_run (bool) – If True, only analyze without making changes.
start_paper (int) – First paper number to process (inclusive) - for numeric folders.
end_paper (int) – Last paper number to process (inclusive) - for numeric folders.
folder_list (list) – List of folder names to process (overrides start_paper/end_paper).
- Returns:
Summary of processing results.
- Return type:
Dict[str, Any]