[NXP-25716] Simplify fulltext extraction - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 10.3
Component/s: Core, Events / Works, Streams

Tags:
Impact type:

Configuration Change
Upgrade notes:
Hide

Fulltext maximum size is now 128 KB by default. To change this, the repository configuration can be updated to use another fieldSizeLimit value:

<fulltext ... fieldSizeLimit="1048576"> ... </fulltext>

A value of 0 means no limit.

See https://doc.nuxeo.com/nxdoc/repository-configuration/#full-text for more
Show
Fulltext maximum size is now 128 KB by default. To change this, the repository configuration can be updated to use another fieldSizeLimit value: <fulltext ... fieldSizeLimit= "1048576" > ... </fulltext> A value of 0 means no limit. See https://doc.nuxeo.com/nxdoc/repository-configuration/#full-text for more
Sprint:
nxFG 10.3.5, nxFG 10.3.6
Story Points:
3

Description

FulltextUpdaterWork was created by ~~NXP-10864~~ in order to write newly-extracted fulltext into the database from a single thread, to avoid overloading some databases. This is separate from the FulltextExtractorWork which does the conversion from the binary into text, but currently does not store this text.

Because the extracted text has to be passed to a second work, the text is stored in a queue for asynchronous processing. However this may be very large, easily several megabytes, which is a problem because queues like Kafka or Chronicle are not designed to work efficiently with such large volumes.

The original problem leading to this design was a database we rarely use (SQL Server) and in addition we have other means of solving the concurrency problem these days (retry on ConcurrentUpdateException).

—> merge back FulltextUpdaterWork into FulltextExtractorWork

Attachments

Issue Links

depends on

NXP-5689 VCS: store PostgreSQL fulltext in clear text

Resolved

NXP-10864 Write to fulltext table single-threaded

Resolved

NXP-8797 Make fulltext normalization algorithm configurable

Resolved

is related to

NXP-14672 Factor common code for VCS and DBS

Resolved

NXP-17862 Improve fulltext extraction

Resolved

NXP-25279 Make the raw binary text available for processing

Resolved

NXP-26618 FulltextUpdaterWork class not found after migration

Resolved

is required by

NXP-25781 Allow per-field fulltext indexing for MongoDB

Resolved

NXP-25359 FulltextExtractorWork should only get called once per blob

Resolved

Is referenced in

(2 is related to, 2 is required by, 7 Is referenced in)

Activity

People

Assignee:

Florent Guillaume

Reporter:

Florent Guillaume

Participants:

Benoit Delbosc, Florent Guillaume, Jenkins, Support Tech User

Votes:

0 Vote for this issue

Watchers:

4 Start watching this issue

Dates

Created:

2018-09-03 13:33

Updated:

2021-09-09 13:06

Resolved:

2018-09-21 23:17

Time Tracking

Estimated:

Remaining:

Logged:

3d 2h