-
Type: Bug
-
Status: Resolved
-
Priority: Minor
-
Resolution: Fixed
-
Affects Version/s: 6.0-HF08, 7.2
-
Component/s: Elasticsearch
-
Tags:
-
Upgrade notes:
When indexing field bigger than 32k if the field use the default keyword analyzer then the maximum lucene field size is reached and we got error like:
[org.nuxeo.elasticsearch.core.ElasticSearchIndexingImpl] failure in bulk execution: [41]: index [nuxeo], type [doc], id [1469094a-1c14-4677-b19e-c209683032b2], message [IllegalArgumentException[Document contains at least one immense term in field="ecm:binarytext" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '...]...', original message: bytes can be at most 32766 in length; got 319109]; nested: MaxBytesLengthExceededException[bytes can be at most 32766 in length; got 319109]; ]
Possible work around:
- The ecm:binarytext field should not be indexed at all, this will save disk space. Just change the mapping and add "index": "no" on the field see
NXP-16838. - We can use a truncate token filter for the default keyword analyzer, because it makes no sense to have very long keyword. ~250 characters sounds enough to provide sorting of title without taking too much resources.