Complete Workflow ================= The MetaBeeAI pipeline can be run end-to-end or executed in stages. Each stage corresponds to one of the core submodules introduced in the :doc:`Pipeline Overview ` section. .. note:: Before running the workflow, ensure you have configured your API keys and environment variables as described in the :doc:`Setup Guide `. Step 1 — Process PDFs to JSON ----------------------------- .. code-block:: bash metabeeai process-pdfs **Purpose:** Convert PDFs into structured JSON chunks using the `process_pdfs `_ submodule. **Output:** ``data/papers/{paper_id}/pages/merged_v2.json`` Step 2 — Extract Information with LLM ------------------------------------- .. code-block:: bash # Process all papers (default configuration) metabeeai llm # Predefined configurations metabeeai llm --preset balanced # Fast relevance + high-quality answers metabeeai llm --preset fast # Fast & cheap metabeeai llm --preset quality # High quality for critical analysis # Process specific papers metabeeai llm --papers 4YD2Y4J8 76DQP2DC # Custom model selection metabeeai llm --relevance-model "openai/gpt-4o-mini" --answer-model "openai/gpt-4o" **Purpose:** Run LLM-based extraction using the `metabeeai_llm `_ submodule. **Input:** JSON chunks from Stage 1 **Output:** ``data/papers/{paper_id}/answers.json`` Questions are defined in ``metabeeai_llm/questions.yml``. Step 3 — Human Review (Optional) -------------------------------- .. code-block:: bash metabeeai review **Purpose:** Launch the graphical review interface provided by the `llm_review_software `_ submodule. **Output:** ``data/papers/{paper_id}/answers_extended.json`` Features include PDF viewing, answer editing, and quality ratings. Step 4 — Benchmarking and Evaluation ------------------------------------ 4a. **Prepare Reviewer Data** If you have CSV golden answers: .. code-block:: bash metabeeai prep-benchmark --papers-dir /path/to/papers --output data/benchmark_data_gui.json # If you already ran the review GUI, reviewer data is read automatically from ``answers_extended.json``. 4b. **Create Benchmark Dataset** .. code-block:: bash metabeeai prep-benchmark **Purpose:** Generate benchmarking datasets using the `llm_benchmarking `_ submodule. **Output:** ``data/benchmark_data.json`` or ``data/benchmark_data_gui.json`` 4c. **Run Evaluation** .. code-block:: bash metabeeai benchmark --question design metabeeai benchmark --question population metabeeai benchmark --question welfare # Evaluate all at once metabeeai benchmark **Output:** ``deepeval_results/combined_results_{question}_{timestamp}.json`` 4d. **Visualize Results** .. code-block:: bash metabeeai plot-metrics **Output:** ``deepeval_results/plots/metrics_comparison.png`` 4e. **Identify Problem Papers (Optional)** .. code-block:: bash metabeeai edge-cases --num-cases 3 **Output:** ``edge_cases/edge-case-report.md`` Step 5 — Data Analysis ---------------------- .. code-block:: bash # Data analysis scripts (run as modules) python -m metabeeai.query_database.trend_analysis python -m metabeeai.query_database.network_analysis python -m metabeeai.query_database.investigate_bee_species python -m metabeeai.query_database.investigate_pesticides **Purpose:** Perform large-scale data aggregation and analysis using the `query_database `_ submodule. **Output:** Analytical reports, plots, and datasets in ``query_database/output/``.