Complete Workflow#

The MetaBeeAI pipeline can be run end-to-end or executed in stages. Each stage corresponds to one of the core submodules introduced in the Pipeline Overview section.

Note

Before running the workflow, ensure you have configured your API keys and environment variables as described in the Setup Guide.

Step 1 — Process PDFs to JSON#

metabeeai process-pdfs

Purpose: Convert PDFs into structured JSON chunks using the process_pdfs submodule. Output: data/papers/{paper_id}/pages/merged_v2.json

Step 2 — Extract Information with LLM#

# Process all papers (default configuration)
metabeeai llm

# Predefined configurations (you can choose any ofthe following three to preset the answer quality level you expect)
metabeeai llm --preset balanced   # Fast relevance + high-quality answers
metabeeai llm --preset fast       # Fast & cheap
metabeeai llm --preset quality    # High quality for critical analysis

# Process specific papers
metabeeai llm --papers 4YD2Y4J8 76DQP2DC

# Custom model selection
metabeeai llm --relevance-model "openai/gpt-4o-mini" --answer-model "openai/gpt-4o"

Purpose: Run LLM-based extraction using the metabeeai_llm submodule. Input: JSON chunks from Stage 1 Output: data/papers/{paper_id}/answers.json

Questions are defined in metabeeai_llm/questions.yml.

Step 3 — Human Review (Optional)#

metabeeai review

Purpose: Launch the graphical review interface provided by the llm_review_software submodule. Output: data/papers/{paper_id}/answers_extended.json

Features include PDF viewing, answer editing, and quality ratings.

Step 4 — Benchmarking and Evaluation#

Outputs generated in this section will be saved under the data/ folder on the main page.

4a. Prepare Reviewer Data

If you have CSV golden answers:

metabeeai prep-benchmark --papers-dir /path/to/papers --output data/benchmark_data_gui.json
# If you already ran the review GUI, reviewer data is read automatically from ``answers_extended.json``.

4b. Create Benchmark Dataset

metabeeai prep-benchmark

Purpose: Generate benchmarking datasets using the llm_benchmarking submodule. Output: data/benchmark_data.json or data/benchmark_data_gui.json

4c. Run Evaluation

metabeeai benchmark --question design
metabeeai benchmark --question population
metabeeai benchmark --question welfare

# Evaluate all at once
metabeeai benchmark

Output: deepeval_results/combined_results_{question}_{timestamp}.json

4d. Visualize Results

metabeeai plot-metrics

Output: deepeval_results/plots/accuracy_geval.png, deepeval_results/plots/completeness_geval.png, deepeval_results/plots/contextual_precision.png, deepeval_results/plots/contextual_recall.png, deepeval_results/plots/faithfulness.png, deepeval_results/plots/summary_metrics.png.

4e. Identify Problem Papers (Optional)

metabeeai edge-cases --num-cases 3

Output: edge_cases/edge-case-report.md

Step 5 — Data Analysis#

# Data analysis scripts (run as modules)
python -m metabeeai.query_database.trend_analysis
python -m metabeeai.query_database.network_analysis
python -m metabeeai.query_database.investigate_bee_species
python -m metabeeai.query_database.investigate_pesticides

Purpose: Perform large-scale data aggregation and analysis using the query_database submodule. Output: Analytical reports, plots, and datasets in src/metabeeai/query_database/output/.