CVE-2025-6211: Document Chunk Id Hash Collision

A document chunk ID is generated by hashing only its text content (`$OBJ.text`), while structural metadata (`$OBJ.$PROP`) is available and included in the output. This can lead to hash collisions and data overwrites if different structural elements contain identical text. Include structural properties in the hash input to ensure uniqueness.

Provally CuratedPublic repositoryMediumMedium confidenceVerifiedApache-2.0Python
greprules fetch cve-2025-6211-document-chunk-id-hash-collision --engine opengrep

Description

A document chunk ID is generated by hashing only its text content (`$OBJ.text`), while structural metadata (`$OBJ.$PROP`) is available and included in the output. This can lead to hash collisions and data overwrites if different structural elements contain identical text. Include structural properties in the hash input to ensure uniqueness.