deduplicate_chunks#

metabeeai.process_pdfs.deduplicate_chunks(chunks)[source]#

Remove duplicate chunks based on text content while preserving chunk IDs.

Parameters:

chunks (List[Dict[str, Any]]) – List of chunks to deduplicate.

Returns:

Deduplicated list of chunks with merged chunk IDs.

Return type:

List[Dict[str, Any]]