MetaBeeAI#

MetaBeeAI is an open-source, modular pipeline for extracting structured information from scientific papers for systematic review and meta-analysis in biology.

Use these docs to install MetaBeeAI, configure reproducible processing runs, understand the major modules, and navigate the full workflow from raw papers to structured outputs.

Start with Installation and Quick Start. If you already have the package running, jump straight to Configuration, Workflow, or Reference.

Explore The Docs#

Install MetaBeeAI

Set up the package, configure required keys, and verify that the CLI is available.

Installation Guide
Quick Start

Follow the shortest path from installation to processing a set of PDFs.

Quick Start Guide
User Guide

Read the end-to-end guides for setup, configuration, workflow, benchmarking, and troubleshooting.

User Guide
Reference

Find API docs, submodule overviews, and lower-level reference material in one place.

Reference
Developer Guide

Find contribution, configuration-development, and documentation-maintenance guides for working on MetaBeeAI itself.

Developer Guide

Pipeline Guides#

Setup

Prepare local directories, expected file layout, and the project structure used by the pipeline.

Setup Guide
Configuration

Understand configuration files, environment variables, defaults, and precedence.

Configuration Guide
Workflow

Follow the end-to-end processing flow from raw PDFs to structured outputs.

Complete Workflow
Benchmarking

Compare model behaviour and evaluate extraction quality across runs.

Benchmarking
Data Analysis

Work with processed outputs and downstream analysis steps after extraction is complete.

Data Analysis
Troubleshooting

Diagnose installation issues, pipeline failures, and common runtime problems.

Troubleshooting Guide

Reference And Internals#

Submodules

Read higher-level descriptions of the major pipeline components and what each one is responsible for.

Submodules
PDF Processing Pipeline

Understand how PDFs are split, merged, deduplicated, and prepared for downstream extraction.

PDF Processing Pipeline
LLM Pipeline

Review the prompts, extraction flow, and higher-level orchestration behind the LLM stage.

LLM Pipeline
Query And Analysis Tools

Find the query and analysis layer used for working with processed outputs.

Query Database - Data Extraction and Analysis

Development#

Developer Guide

Set up a development environment, run checks, and navigate the project’s developer documentation.

Developer Guide
Config Development

Extend the configuration layer safely and consistently.

Configuration System - Developer Guide
Documentation

Build, debug, and maintain the Sphinx documentation stack.

Building Documentation