Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-17862

Improve fulltext extraction

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 6.0, 7.4
    • Fix Version/s: 8.3
    • Component/s: Core VCS

      Description

      At the moment (7.4) the fulltext is extracted by a job using the default pool and saved into the db with another job (fulltext udpater) using a dedicated pool with a single thread to serialize db access.

      The serialization is needed for some database backend that does not support concurrent update on a field with a fulltext index.

      When creating large amount of documents this can be a bottleneck, the updater work has a lower throughput and the queue size increase until it generates GC storm.

      When the fulltext index is only done with Elasticsearch using nuxeo.vcs.fulltext.search.disabled=true. The extraction process can be optimized using a single job instead of 2, also these worker should have a dedicated pool so it can be easily tuned.

      ----------------

      Indexing flow is optimized by using ES bulk indexing features, when fulltext indexation is disabled at the repository level.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  PagerDuty

                  Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.