LLM Review Software#
This submodule provides a graphical user interface (GUI) for reviewing and annotating LLM-generated answers from research papers. Reviewers use this software to create “golden” ground truth answers that are then used in the benchmarking pipeline to evaluate LLM performance.
Overview#
The llm_review_software submodule consists of two main components:
BeeGUI (
beegui.py) - A PyQt5-based GUI application for interactive review and annotationPDF Annotator (
annotator.py) - A command-line tool for annotating PDFs with bounding boxes
Purpose#
The review software allows human reviewers to:
Review LLM-generated answers from research papers
Add, edit, and validate reviewer-provided answers
Rate answer quality
Select relevant text chunks from PDFs
Generate
answers_extended.jsonfiles containing “golden” ground truth answers
These “golden” answers are then used by the llm_benchmarking submodule to evaluate LLM performance against human-reviewed ground truth.
Installation#
This submodule is part of the metabeeai package. Install it via:
pip install metabeeai
Or if installing from source:
pip install -e /path/to/MetaBeeAI
Required dependencies: PyQt5, PyMuPDF (fitz), termcolor
Usage#
Launching the Review GUI#
When the metabeeai package is installed, launch the GUI using:
metabeeai review
The GUI will automatically attempt to load papers from data/papers/ directory (or you can specify a different folder via File → Open Folder).
Note: For Python module syntax alternatives, see the Alternative: Python Module Syntax section below.
BeeGUI - Review Interface#
Interface Overview#
The GUI consists of three main panes:
Left Pane: Paper navigation and controls
Paper list (shows all available papers with progress percentage)
Previous/Next paper buttons
Page navigation controls
Zoom slider
Modification status indicator
Center Pane: PDF viewer
Displays the current PDF page
Shows annotations (bounding boxes) for text chunks
Supports panning (drag) and zooming (Ctrl+wheel or slider)
Hover tooltips show chunk IDs and associated questions
Right Pane: Question panel and answer fields
Question list (from
answers.json)Answer input fields:
Positive answer
Negative answer
Positive reason
Negative reason
Star rating (0-5)
Chunk ID list (when a question is selected)
Mode buttons (Individual/All)
Workflow#
1. Selecting a Paper#
Papers are listed in the left pane with their progress percentage
Click on a paper ID to load it
Use “Prev Paper” / “Next Paper” buttons to navigate
Progress shows completion percentage based on filled answer fields
3. Selecting Questions#
Questions appear in the right pane (loaded from
answers.json)Click on a question to select it
When selected:
Associated chunks are highlighted on the PDF
Chunk IDs are listed below the question panel
Answer fields become editable
4. Adding/Editing Answers#
For each question, reviewers can provide:
Positive Answer (
user_answer_positive): The correct/positive answerNegative Answer (
user_answer_negative): What should NOT be includedPositive Reason (
user_reason_positive): Reasoning for the positive answerNegative Reason (
user_reason_negative): Reasoning for the negative answerRating (
user_rating): Star rating (0-5) indicating answer quality
Note: All changes are automatically saved to answers_extended.json in the paper folder.
5. Annotation Modes#
Individual Mode: Shows annotations only for the currently selected question
All Mode: Shows annotations for all questions (useful for overview)
6. Viewing Chunks#
When a question is selected, associated chunk IDs are listed
Click on a chunk ID to navigate to that chunk’s location in the PDF
Hover over annotations in the PDF to see chunk IDs and related questions
Keyboard Shortcuts#
F11: Toggle full-screen mode
Up/Down arrows: Navigate paper list (when focused)
PDF Annotator#
The annotator.py script creates annotated PDFs with bounding boxes for visualization purposes.
Usage#
python -m metabeeai.llm_review_software.annotator --basepath /path/to/data
Note: For Python module syntax alternatives, see the Alternative: Python Module Syntax section below.
What It Does#
The annotator processes all papers in the papers/ directory and creates annotated PDFs:
Red boxes: Question-answer chunks (from
merged_v2.json)Blue boxes: Chunks referenced in
answers.json(with field names as labels)
Output files are saved as {paper_id}_main_annotated.pdf in each paper folder.
Command-Line Arguments#
--basepath PATH: Base path containing thepapersfolder (default: current directory)
File Structure#
Required Files for Each Paper#
Each paper folder should contain:
{paper_id}/
├── {paper_id}_main.pdf # Original PDF
├── answers.json # LLM-generated answers (input)
├── answers_extended.json # Reviewer answers (output - "golden" answers)
└── pages/
└── merged_v2.json # Processed paper chunks with grounding
Output Format: answers_extended.json#
The GUI creates/updates answers_extended.json with the following structure:
{
"QUESTIONS": {
"question_key": {
"user_answer_positive": "Reviewer's positive answer",
"user_answer_negative": "What should NOT be included",
"user_reason_positive": "Reasoning for positive answer",
"user_reason_negative": "Reasoning for negative answer",
"user_rating": 4
}
}
}
This file serves as the “golden” ground truth for benchmarking.
Integration with Benchmarking Pipeline#
The answers_extended.json files created by this review software are used as ground truth in the benchmarking pipeline:
Review Phase (this software):
Reviewers use BeeGUI to create
answers_extended.jsonfilesThese contain human-reviewed “golden” answers
Benchmarking Phase (
llm_benchmarkingsubmodule):prep_benchmark_data.pyreadsanswers_extended.jsonfilesCompares LLM answers (
answers.json) against golden answersCreates benchmark datasets for evaluation
See the llm_benchmarking README for details on the benchmarking workflow.
Tips and Best Practices#
For Reviewers#
Start with Progress Overview: Check the progress percentage in the paper list to see completion status
Use Annotation Modes:
Use “Individual” mode when focusing on one question
Use “All” mode to see all annotations and avoid duplicates
Navigate Efficiently:
Use keyboard shortcuts for faster navigation
Click chunk IDs to jump to specific locations
Complete All Fields: For best benchmarking results, fill in all answer fields (positive/negative answers and reasons)
Save Frequently: Changes auto-save, but you can verify by checking the “Modified” timestamp
Data Quality#
Ensure
answers_extended.jsonfiles are complete before running benchmarkingReviewers should be consistent in their rating criteria
Check that chunk IDs in answers match actual PDF content
Troubleshooting#
Issue: GUI doesn’t launch#
Solution: Ensure PyQt5 is installed:
pip install PyQt5
Issue: Papers not showing in list#
Check:
Papers directory exists and contains paper folders
Each paper folder has
{paper_id}_main.pdfandpages/merged_v2.jsonPaper folder names are numeric (e.g., “002”, “003”)
Issue: PDF not displaying#
Check:
PDF file exists:
{paper_id}_main.pdfFile is not corrupted
PyMuPDF (fitz) is installed:
pip install pymupdf
Issue: Questions not appearing#
Check:
answers.jsonexists in the paper folderFile has valid JSON structure with “QUESTIONS” key
File encoding is UTF-8
Issue: Answers not saving#
Check:
Write permissions on the paper folder
Disk space available
Check console output for error messages
Configuration#
The GUI automatically detects the papers directory from the metabeeai.config module. Default location is data/papers/.
You can change the directory via:
File → Open Folder menu option
The directory is remembered during the session
Future Enhancements#
The following features are planned for future versions:
Screenshots and detailed visual guides for paper selection
Step-by-step instructions for adding responses
Batch processing capabilities
Export/import functionality for reviewer annotations
Collaborative review features
References#
Benchmarking Pipeline: See
llm_benchmarkingsubmodule READMEPDF Processing: See
process_pdfssubmodule READMELLM Pipeline: See
metabeeai_llmsubmodule README
Alternative: Python Module Syntax#
Instead of using the CLI commands, you can also run the scripts directly as Python modules. This is useful if you need to integrate the functionality into other Python scripts or prefer direct module execution.
Launching the Review GUI#
metabeeai review
Running the PDF Annotator#
python -m metabeeai.llm_review_software.annotator --basepath /path/to/data
Example with Custom Options#
# Annotate PDFs with custom base path
python -m metabeeai.llm_review_software.annotator --basepath /custom/path/to/data
All command-line arguments are identical between CLI commands and Python module syntax. The only difference is the invocation method.
Support#
For issues or questions:
Check this README first
Review error messages in console output
Verify file structure matches requirements
Check that all dependencies are installed
Last Updated: Nov 21 2025