deduplicate_chunks#
- metabeeai.process_pdfs.deduplicate_chunks(chunks)[source]#
Remove duplicate chunks based on text content while preserving chunk IDs.
- Parameters:
chunks (
List[Dict[str,Any]]) – List of chunks to deduplicate.- Returns:
Deduplicated list of chunks with merged chunk IDs.
- Return type:
List[Dict[str,Any]]