deduplicate_chunks#
- metabeeai.process_pdfs.deduplicate_chunks(chunks)[source]#
Remove duplicate chunks based on text content while preserving chunk IDs.
- Parameters:
chunks (List[Dict[str, Any]]) – List of chunks to deduplicate.
- Returns:
Deduplicated list of chunks with merged chunk IDs.
- Return type:
List[Dict[str, Any]]