Pipeline Overview
=================

The MetaBeeAI pipeline is composed of five core **submodules**, each responsible for a distinct stage of the literature review and analysis process.

.. code-block::

   PDFs → Vision AI Processing → LLM Analysis → Human Review → Benchmarking → Analysis

For a detailed overview of the various stages see their respective sections in the :doc:`submodule documentation <../submodule/index>`.

Stages
------

1. **PDF Processing → Structured JSON**
   - **Submodule:** :doc:`process_pdfs <../submodule/process_pdfs>`
   - **Purpose:** Convert PDFs into structured JSON text with layout and coordinate data
   - **Input:** PDF files
   - **Output:** JSON chunks representing extracted text and layout elements
   - **API Reference:** :doc:`Process PDFs <../api/process_pdfs>`

1. **LLM Question Answering → Extracted Information**
   - **Submodule:** :doc:`metabeeai_llm <../submodule/metabeeai_llm>`
   - **Purpose:** Use large language models to extract structured answers and citations from processed text
   - **Input:** JSON chunks
   - **Output:** Structured question–answer pairs with traceable sources
   - **API Reference:** :doc:`MetaBeeAI LLM <../api/metabeeai_llm>`

1. **Human Review & Annotation → Validated Answers**
   - **Submodule:** :doc:`llm_review_software <../submodule/llm_review_software>`
   - **Purpose:** Provide a graphical interface for human review and validation of LLM answers
   - **Input:** LLM-generated answers
   - **Output:** Human-verified and annotated answers
   - **API Reference:** :doc:`LLM Review Software <../api/llm_review_software>`

1. **Benchmarking → Performance Metrics**
   - **Submodule:** :doc:`llm_benchmarking <../submodule/llm_benchmarking>`
   - **Purpose:** Evaluate model performance against human-reviewed ground truth
   - **Input:** LLM and reviewer answers
   - **Output:** Quantitative metrics, comparisons, and performance plots
   - **API Reference:** :doc:`LLM Benchmarking <../api/llm_benchmarking>`

1. **Data Analysis → Insights**
   - **Submodule:** :doc:`query_database <../submodule/query_database>`
   - **Purpose:** Aggregate validated data across studies and perform trend and network analyses
   - **Input:** Structured and benchmarked results
   - **Output:** Analytical summaries, visualizations, and derived datasets
   - **API Reference:** :doc:`Query Database <../api/query_database>`