Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-16807

Elasticsearch fails to index document with token field bigger than 32k

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 6.0-HF08, 7.2
    • Fix Version/s: 6.0-HF10, 7.3
    • Component/s: Elasticsearch
    • Tags:
    • Upgrade notes:
      Hide

      This requires to change the mapping, this can be done using the Admin Center / Elasticsearch / Admin / Re-indexing all the repository.

      Show
      This requires to change the mapping, this can be done using the Admin Center / Elasticsearch / Admin / Re-indexing all the repository.

      Description

      When indexing field bigger than 32k if the field use the default keyword analyzer then the maximum lucene field size is reached and we got error like:

      [org.nuxeo.elasticsearch.core.ElasticSearchIndexingImpl] failure in bulk execution:
      [41]: index [nuxeo], type [doc], id [1469094a-1c14-4677-b19e-c209683032b2], message [IllegalArgumentException[Document contains at least one immense term in field="ecm:binarytext" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '...]...', original message: bytes can be at most 32766 in length; got 319109]; nested: MaxBytesLengthExceededException[bytes can be at most 32766 in length; got 319109]; ]
      

      Possible work around:

      • The ecm:binarytext field should not be indexed at all, this will save disk space. Just change the mapping and add "index": "no" on the field see NXP-16838.
      • We can use a truncate token filter for the default keyword analyzer, because it makes no sense to have very long keyword. ~250 characters sounds enough to provide sorting of title without taking too much resources.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                PagerDuty

                Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.