-
Type: Bug
-
Status: Resolved
-
Priority: Minor
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: 10.10-HF63, 2021.23
-
Component/s: Bulk, Elasticsearch
-
Release Notes Summary:Record overflow is avoided during bulk indexing of huge fulltext
-
Tags:
-
Team:PLATFORM
-
Sprint:nxplatform #65
-
Story Points:3
The bulk indexing is materializing the query into a stream, the elastic query contains the json representation of the document including the fulltext field, if this is bigger than the max record size (1MB) then there is a record overflow.
WARN Indexing request for doc: 0ae55b22-df2d-46a4-86b5-da86b695e66f, is too large: 1503580, max record size: 900000 // then ERROR bulk/index: Error during checkpoint, processing will be duplicated: bulk/index: CHECKPOINT FAILURE: Resuming with possible duplicate processing. "org.nuxeo.lib.stream.computation.log.ComputationRunner$CheckPointException","cause":{"commonElementCount":10,"localizedMessage":"Unable to send record: ProducerRecord(topic=nuxeo-bulk-bulkIndex, partition=6, headers=RecordHeaders(headers = [], isReadOnly = true), key=84b52c6c-4049-40d8-a0bf-5855bd2edcbe:5094-6, value=\\xC3\\x01\\x98\\xD4\\xE8s\\x ....
Obviously the retry mechanism doesn't help.
There must be an overflow filter by default to avoid this like for the csvExport (see NXP-30796)
Also, we should not dump the entire record in case of error this mess up DD logs.
- causes
-
NXP-31551 Unable to start with GridFS
- Resolved