Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-17692

Improve Elasticsearch fulltext analyzer to support unicode wildcard search and html

    XMLWordPrintable

    Details

      Description

      simple_query_string using wildcard pattern are not analyzed just lowercased.
      For instance Déjà is indexed as deja the following search will work: d* dej* De* and the following will fail: dé*

      To prevent this we should preserve the original string in the asciifolding filter.

      Also indexing html should be taken in account by filter so we can index Déjà as déjà and not d eacute j agrave.

      Adding the html_strip char filter do the work.

      Default fulltext analyzer now can search on accented word with wildcard déj* and support html tags convertion.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: