MetaBeeAI#

MetaBeeAI is an open-source, modular pipeline for extracting structured information from scientific papers for systematic review and meta-analysis in biology.

Use these docs to install MetaBeeAI, configure reproducible processing runs, understand the major modules, and navigate the full workflow from raw papers to structured outputs.

Start with Installation and Quick Start. If you already have the package running, jump straight to Configuration, Workflow, or Reference.

Explore The Docs#

Install MetaBeeAI

Set up the package, configure required keys, and verify that the CLI is available.

Installation Guide

Quick Start

Follow the shortest path from installation to processing a set of PDFs.

Quick Start Guide

User Guide

Read the end-to-end guides for setup, configuration, workflow, benchmarking, and troubleshooting.

User Guide

Reference

Find API docs, submodule overviews, and lower-level reference material in one place.

Reference

Developer Guide

Find contribution, configuration-development, and documentation-maintenance guides for working on MetaBeeAI itself.

Developer Guide

Pipeline Guides#

Setup

Prepare local directories, expected file layout, and the project structure used by the pipeline.

Setup Guide

Configuration

Understand configuration files, environment variables, defaults, and precedence.

Configuration Guide

Workflow

Follow the end-to-end processing flow from raw PDFs to structured outputs.

Complete Workflow

Benchmarking

Compare model behaviour and evaluate extraction quality across runs.

Benchmarking

Data Analysis

Work with processed outputs and downstream analysis steps after extraction is complete.

Data Analysis

Troubleshooting

Diagnose installation issues, pipeline failures, and common runtime problems.

Troubleshooting Guide

Reference And Internals#

Submodules

Read higher-level descriptions of the major pipeline components and what each one is responsible for.

Submodules

PDF Processing Pipeline

Understand how PDFs are split, merged, deduplicated, and prepared for downstream extraction.

PDF Processing Pipeline

LLM Pipeline

Review the prompts, extraction flow, and higher-level orchestration behind the LLM stage.

LLM Pipeline

Query And Analysis Tools

Find the query and analysis layer used for working with processed outputs.

Query Database - Data Extraction and Analysis

Development#

Developer Guide

Set up a development environment, run checks, and navigate the project’s developer documentation.

Developer Guide

Config Development

Extend the configuration layer safely and consistently.

Configuration System - Developer Guide

Documentation

Build, debug, and maintain the Sphinx documentation stack.

Building Documentation