metabeeai.process_pdfs#

metabeeai.process_pdfs Package#

Functions#

split_pdfs([papers_dir, pages_per_split])

Split PDFs in the specified directory into single-page or overlapping 2-page segments.

process_papers([papers_dir, start_folder])

Process papers in the specified directory using Vision Agentic Document Analysis, starting from an optional folder.

process_all_papers(base_papers_dir, filter_types)

adjust_and_merge_json(json_files, output_file)

run_batch_deduplicate([base_dir, dry_run, ...])

Process all merged_v2.json files in the base directory.

analyze_chunk_uniqueness(chunks)

Analyze the uniqueness of chunks in a paper.

deduplicate_chunks(chunks)

Remove duplicate chunks based on text content while preserving chunk IDs.

process_merged_json_file(json_file_path[, ...])

Process a merged JSON file to deduplicate chunks and save the result.