run_batch_deduplicate#

metabeeai.process_pdfs.run_batch_deduplicate(base_dir=None, dry_run=False, start_paper=None, end_paper=None, folder_list=None)[source]#

Process all merged_v2.json files in the base directory.

Parameters:
  • base_dir (Path) – Base directory containing paper folders.

  • dry_run (bool) – If True, only analyze without making changes.

  • start_paper (int) – First paper number to process (inclusive) - for numeric folders.

  • end_paper (int) – Last paper number to process (inclusive) - for numeric folders.

  • folder_list (list) – List of folder names to process (overrides start_paper/end_paper).

Returns:

Summary of processing results.

Return type:

Dict[str, Any]