[NXP-17862] Improve fulltext extraction - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 6.0, 7.4
Fix Version/s: 8.3
Component/s: Core VCS

Tags:
Sprint:
nxSL Sprint 8.2.1, nxSL Sprint 8.3.1
Story Points:
0

Description

At the moment (7.4) the fulltext is extracted by a job using the default pool and saved into the db with another job (fulltext udpater) using a dedicated pool with a single thread to serialize db access.

The serialization is needed for some database backend that does not support concurrent update on a field with a fulltext index.

When creating large amount of documents this can be a bottleneck, the updater work has a lower throughput and the queue size increase until it generates GC storm.

When the fulltext index is only done with Elasticsearch using nuxeo.vcs.fulltext.search.disabled=true. The extraction process can be optimized using a single job instead of 2, also these worker should have a dedicated pool so it can be easily tuned.

----------------

Indexing flow is optimized by using ES bulk indexing features, when fulltext indexation is disabled at the repository level.

Attachments

Issue Links

depends on

NXBT-1018 Bench: Test with bigger default worker pool size

Resolved

NXP-17934 Improve audit performance and reliability

Resolved

is related to

NXP-25716 Simplify fulltext extraction

Resolved

is required by

NXBT-1028 Bench: Analyze audit and fulltext updater refactoring

Resolved

NXP-19128 Reduce job serialization footprint

Resolved

Activity

People

Assignee:

Stéphane Lacoin

Reporter:

Benoit Delbosc

Participants:

Benoit Delbosc, Jenkins, Stéphane Lacoin

Votes:

0 Vote for this issue

Watchers:

3 Start watching this issue

Dates

Created:

2015-09-16 13:14

Updated:

2018-09-04 16:43

Resolved:

2016-04-19 07:55