[NXP-16807] Elasticsearch fails to index document with token field bigger than 32k - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 6.0-HF08, 7.2
Fix Version/s: 6.0-HF10, 7.3
Component/s: Elasticsearch

Tags:
- nxRepoTeam
Upgrade notes:

Hide

This requires to change the mapping, this can be done using the Admin Center / Elasticsearch / Admin / Re-indexing all the repository.

Show
This requires to change the mapping, this can be done using the Admin Center / Elasticsearch / Admin / Re-indexing all the repository.

Description

When indexing field bigger than 32k if the field use the default keyword analyzer then the maximum lucene field size is reached and we got error like:

[org.nuxeo.elasticsearch.core.ElasticSearchIndexingImpl] failure in bulk execution:
[41]: index [nuxeo], type [doc], id [1469094a-1c14-4677-b19e-c209683032b2], message [IllegalArgumentException[Document contains at least one immense term in field="ecm:binarytext" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '...]...', original message: bytes can be at most 32766 in length; got 319109]; nested: MaxBytesLengthExceededException[bytes can be at most 32766 in length; got 319109]; ]

Possible work around:

The ecm:binarytext field should not be indexed at all, this will save disk space. Just change the mapping and add "index": "no" on the field see ~~NXP-16838~~.
We can use a truncate token filter for the default keyword analyzer, because it makes no sense to have very long keyword. ~250 characters sounds enough to provide sorting of title without taking too much resources.

Attachments

Activity

People

Assignee:

Benoit Delbosc

Reporter:

Benoit Delbosc

Participants:

Benoit Delbosc, Jenkins

Votes:

0 Vote for this issue

Watchers:

3 Start watching this issue

Dates

Created:

2015-03-24 16:48

Updated:

2015-09-09 10:41

Resolved:

2015-04-01 09:40