Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-25716

Simplify fulltext extraction

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 10.3
    • Component/s: Core, Events / Works, Streams
    • Impact type:
      Configuration Change
    • Upgrade notes:
      Hide

      Fulltext maximum size is now 128 KB by default. To change this, the repository configuration can be updated to use another fieldSizeLimit value:

      <fulltext ... fieldSizeLimit="1048576">
        ...
      </fulltext>
      

      A value of 0 means no limit.

      See https://doc.nuxeo.com/nxdoc/repository-configuration/#full-text for more

      Show
      Fulltext maximum size is now 128 KB by default. To change this, the repository configuration can be updated to use another fieldSizeLimit value: <fulltext ... fieldSizeLimit= "1048576" > ... </fulltext> A value of 0 means no limit. See https://doc.nuxeo.com/nxdoc/repository-configuration/#full-text for more
    • Sprint:
      nxFG 10.3.5, nxFG 10.3.6
    • Story Points:
      3

      Description

      FulltextUpdaterWork was created by NXP-10864 in order to write newly-extracted fulltext into the database from a single thread, to avoid overloading some databases. This is separate from the FulltextExtractorWork which does the conversion from the binary into text, but currently does not store this text.

      Because the extracted text has to be passed to a second work, the text is stored in a queue for asynchronous processing. However this may be very large, easily several megabytes, which is a problem because queues like Kafka or Chronicle are not designed to work efficiently with such large volumes.

      The original problem leading to this design was a database we rarely use (SQL Server) and in addition we have other means of solving the concurrency problem these days (retry on ConcurrentUpdateException).

      —> merge back FulltextUpdaterWork into FulltextExtractorWork

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 0 minutes
                  0m
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 3 days, 2 hours
                  3d 2h

                    PagerDuty

                    Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.