Configuration Guide#
MetaBeeAI uses a flexible configuration system that allows you to set parameters through multiple sources, with a clear hierarchy of precedence.
Configuration Hierarchy#
When MetaBeeAI looks for a configuration parameter, it checks sources in this order (highest priority first):
CLI Arguments: Command-line flags like
--papers-diror--data-dirConfig File (CLI): YAML file specified via
--config /path/to/config.yamlConfig File (Environment): YAML file specified via
METABEEAI_CONFIG_FILEenv varConfig File (Default):
./config.yamlin current directoryEnvironment Variables:
METABEEAI_PAPERS_DIR,OPENAI_API_KEY, etc.Hardcoded Defaults: Built-in default values
Key Point: Values in config files override environment variables. Use env vars for temporary overrides or secrets that shouldn’t be in config files.
Quick Start#
Copy the example config:
cp config.example.yaml config.yaml
Edit
config.yamlto customize your settings:# config.yaml data_dir: ./data papers_dir: ./data/papers log_level: DEBUG
Run any command - it will automatically load
./config.yaml(Set your API keys before you run the second command. Relevant details will be found at Quick Start Guide):metabeeai llm metabeeai process_pdfs --start 1 --end 10
Configuration File Format#
MetaBeeAI uses YAML format for configuration files:
# Common parameters
data_dir: ./data
papers_dir: ./data/papers
results_dir: ./data/results
log_level: INFO
# API keys (better to use env vars for these!)
openai_api_key: "sk-..."
landing_api_key: "..."
# Nested settings for specific commands
llm:
relevance_model: "gpt-4o-mini"
answer_model: "gpt-4o"
preset: "balanced"
process_pdfs:
batch_size: 10
benchmark:
model: "gpt-4o"
batch_size: 25
max_retries: 5
Common Parameters#
These parameters are available across all MetaBeeAI commands:
Parameter |
YAML Key |
Environment Variable |
Default |
|---|---|---|---|
Data directory |
|
|
|
Papers directory |
|
|
|
Results directory |
|
|
|
Output directory |
|
|
|
Logs directory |
|
|
|
Log level |
|
|
|
OpenAI API key |
|
|
None |
Landing AI API key |
|
|
None |
Using Environment Variables#
Environment variables are useful for:
Temporary overrides during development
Secrets that shouldn’t be committed to version control
CI/CD environments
Set environment variables in your shell:
export METABEEAI_PAPERS_DIR=/tmp/papers
export OPENAI_API_KEY=sk-your-key-here
export METABEEAI_LOG_LEVEL=DEBUG
Or use a .env file:
# .env
METABEEAI_PAPERS_DIR=/tmp/papers
OPENAI_API_KEY=sk-your-key-here
METABEEAI_LOG_LEVEL=DEBUG
Important: If a parameter is set in both a config file and an environment variable, the config file wins. This ensures config files provide stable, explicit configuration.
Using Config Files#
Specifying Config File Location#
Three ways to specify which config file to use:
Automatic: Place
config.yamlin your current directory:# MetaBeeAI will automatically find and load it metabeeai llm
CLI flag: Use
--configbefore the command name:metabeeai --config /path/to/custom-config.yaml llm
Environment variable: Set
METABEEAI_CONFIG_FILE:export METABEEAI_CONFIG_FILE=/path/to/config.yaml metabeeai llm
Config File Best Practices#
DO:
Keep
config.yamlin your project directory for project-specific settingsUse
--configorMETABEEAI_CONFIG_FILEto specify alternate config locationsCommit
config.example.yamlto version control as a templateUse environment variables for API keys and secrets
DON’T:
Don’t commit
config.yamlwith real API keys to version controlDon’t rely on environment variables for persistent settings (use config files)
Don’t mix personal settings into project config files
Examples#
Example 1: Development Setup#
Use config file for stable settings, env vars for secrets:
# config.yaml (committed to git)
data_dir: ./data
papers_dir: ./data/papers
log_level: DEBUG
llm:
relevance_model: "gpt-4o-mini"
answer_model: "gpt-4o"
# .env (NOT committed to git)
OPENAI_API_KEY=sk-your-actual-key
LANDING_AI_API_KEY=your-landing-key
Example 2: Production Setup#
Override specific settings for production:
# config.production.yaml
data_dir: /data/metabeeai
papers_dir: /data/metabeeai/papers
log_level: WARNING
llm:
config: "quality"
Run with:
metabeeai --config config.production.yaml llm
Example 3: Temporary Override#
Use CLI args for one-off changes:
# Override papers directory just for this run
metabeeai process_pdfs --papers-dir /tmp/test-papers --start 1 --end 5
Example 4: CI/CD Environment#
Use environment variables in CI:
# .github/workflows/test.yml
env:
METABEEAI_DATA_DIR: /tmp/ci-data
METABEEAI_LOG_LEVEL: DEBUG
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
Command-Specific Settings#
LLM Pipeline#
llm:
relevance_model: "gpt-4o-mini" # Model for chunk selection
answer_model: "gpt-4o" # Model for answer generation
preset: "balanced" # fast/balanced/quality
temperature: 0.7 # LLM temperature (optional)
max_tokens: 2000 # Max response tokens (optional)
PDF Processing#
process_pdfs:
batch_size: 10 # Parallel processing batch size
skip_split: false # Skip PDF splitting
skip_api: false # Skip API processing
skip_merge: false # Skip JSON merging
skip_deduplicate: false # Skip deduplication
Benchmarking#
benchmark:
model: "gpt-4o" # Evaluation model
batch_size: 25 # Test cases per batch
max_retries: 5 # Max retries per batch
timeout: 120 # Timeout per test (seconds)
Troubleshooting#
Config Not Loading#
Check these in order:
Verify file exists:
ls -la config.yamlCheck YAML syntax:
python -c "import yaml; yaml.safe_load(open('config.yaml'))"Check you’re in the correct directory (config.yaml must be in current directory)
Use
--configto explicitly specify the path
Environment Variable Not Working#
Remember: Config files override environment variables
If you have papers_dir: ./data/papers in your config file, setting METABEEAI_PAPERS_DIR=/tmp/papers won’t work. Either:
Remove the parameter from the config file, OR
Use a CLI argument:
--papers-dir /tmp/papers
Checking Current Configuration#
Use --verbose or --debug to see which config values are being used:
metabeeai --verbose llm
Or programmatically check from Python:
from metabeeai.config import get_config_param, load_config
# Check what config file is loaded
config = load_config()
print(config)
# Check specific parameter
papers_dir = get_config_param("papers_dir")
print(f"Papers directory: {papers_dir}")
See Also#
Quick Start Guide - Getting started with MetaBeeAI
Troubleshooting Guide - Common issues and solutions
Configuration System - Developer Guide - Developer guide for config system