Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-26704

Allow storing extracted fulltext in blobs

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 10.10-HF25, 11.1, 2021.0
    • Component/s: Core
    • Epic Link:
    • Impact type:
      Configuration Change
    • Upgrade notes:
      Hide

      Fulltext extracted from binaries can be stored in a blob provider instead of metadata in the repository by defining:

      nuxeo.vcs.fulltext.storedInBlob=true
      

      (Note that despite the vcs in the name, which is here for regularity with other properties, it also applies to DBS/MongoDB.)

      When doing so, by default a BlobProvider named fulltext will be used to store these blobs. When using a custom blob provider configuration instead of the default local filesystem storage, this fulltext blob provider must be defined accordingly. Usage of this specific blob provider is configured through a blob dispatcher in the default configuration, which may be overridden if needed.

      When defining additional repositories, fulltext blob storage will need to be enabled with XML in the repository contribution:

      <fulltext ... storedInBlob="true" ... />
      

      and a custom blob dispatcher configuration will be needed to take into account this repository.

      Note that when fulltext blob storage is enabled, repository-based fulltext search is automatically disabled (equivalent to nuxeo.vcs.fulltext.search.disabled=true or <fulltext ... searchDisabled="true" ... />).

      Show
      Fulltext extracted from binaries can be stored in a blob provider instead of metadata in the repository by defining: nuxeo.vcs.fulltext.storedInBlob= true (Note that despite the vcs in the name, which is here for regularity with other properties, it also applies to DBS/MongoDB.) When doing so, by default a BlobProvider named fulltext will be used to store these blobs. When using a custom blob provider configuration instead of the default local filesystem storage, this fulltext blob provider must be defined accordingly. Usage of this specific blob provider is configured through a blob dispatcher in the default configuration, which may be overridden if needed. When defining additional repositories, fulltext blob storage will need to be enabled with XML in the repository contribution: <fulltext ... storedInBlob= "true" ... /> and a custom blob dispatcher configuration will be needed to take into account this repository. Note that when fulltext blob storage is enabled, repository-based fulltext search is automatically disabled (equivalent to nuxeo.vcs.fulltext.search.disabled=true or <fulltext ... searchDisabled="true" ... /> ).
    • Team:
      FG
    • Sprint:
      nxFG 11.1.13
    • Story Points:
      8

      Description

      Fulltext extracted from the binaries is currently stored in the document itself as two text system properties. This is useful when we want to index them as a fulltext fields in the database itself, but this use case is increasingly irrelevant given that people use Elasticsearch.

      By storing the fulltext "offline" in a blob we could considerably reduce the size of the document, and still have the ability to access it when required by Elasticsearch for indexing.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 1 day, 3 hours
                  1d 3h

                    PagerDuty

                    Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.