[NXP-17692] Improve Elasticsearch fulltext analyzer to support unicode wildcard search and html - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 6.0
Fix Version/s: 5.8.0-HF36, 6.0-HF22, 7.4
Component/s: Elasticsearch

Tags:
- nxRepoTeam
- release_notes_added

Description

simple_query_string using wildcard pattern are not analyzed just lowercased.
For instance Déjà is indexed as deja the following search will work: d* dej* De* and the following will fail: dé*

To prevent this we should preserve the original string in the asciifolding filter.

Also indexing html should be taken in account by filter so we can index Déjà as déjà and not d eacute j agrave.

Adding the html_strip char filter do the work.
–
Default fulltext analyzer now can search on accented word with wildcard déj* and support html tags convertion.

Attachments

Activity

People

Assignee:

Benoit Delbosc

Reporter:

Benoit Delbosc

Participants:

Benoit Delbosc, Jenkins

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

2015-08-13 10:55

Updated:

2015-10-20 14:14

Resolved:

2015-08-13 13:46