-
Type: Improvement
-
Status: Open
-
Priority: Minor
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: Binary Metadata
The Binary Searchable Text Extraction is a sub-process of the overall Full Text Extraction process that is initiated during data import.
There is already a Blacklist/Whitelist that is used for the Overall Full Text Extraction process that uses DocType or Facet Values, but there isn't one specifically for Binary Searchable Text Extraction SubProcess.
Goal -> Have a document full text indexed (metadata, technical data, etc) but NOT have the Binary Searchable Text extracted for specific Documents (Nuxeo DocType or Facet Driven)
Purpose -> Increase Import Speed bc an additional process (Binary Searchable Text Extraction) is NOT performed and Reduce space used in the Database and ElasticSearch since the Binary Searchable Text is never gathered or stored for the specified list of Nuxeo DocTypes/Facet Values
AC:
Be able to provide a configured Blacklist that will allow for Full Text Extraction but NOT Binary Searchable Text Extraction for the specified list of Nuxeo DocTypes or Facet Values