Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-25315

More like this Elasticsearch hint

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 10.2
    • Component/s: Elasticsearch
    • Release Notes Description:
      Hide

      More Like This Hint

      – A new hint is available that allows to leverage the More Like This query of Eleasticsearch (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html)

      Ex: SELECT * FROM Document WHERE /*+ES: INDEX(dc:title.fulltext,dc:description.fulltext) OPERATOR(more_like_this) */ ecm:uuid = '1234'

      will take the most frequent terms of the title and description of document 1234 and find documents that also match those terms

      Show
      More Like This Hint – A new hint is available that allows to leverage the More Like This query of Eleasticsearch ( https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html ) Ex: SELECT * FROM Document WHERE /*+ES: INDEX(dc:title.fulltext,dc:description.fulltext) OPERATOR(more_like_this) */ ecm:uuid = '1234' will take the most frequent terms of the title and description of document 1234 and find documents that also match those terms
    • Sprint:
      nxcore 10.2.8
    • Story Points:
      1

      Description

      We could augment our elasticsearch hint with a more_like_this operator:
      https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html

      For instance:

      SELECT * FROM Document WHERE /*+ES: INDEX(dc:title.fulltext,dc:description.fulltext) OPERATOR(more_like_this) */ ecm:uuid = '1234'
      

      Is translated to:

      {
        "more_like_this" : {
          "like" : [
            "dc:title.fulltext",
            "dc:description.fulltext",
            {
              "_index" : "nxutest",
              "_type" : "doc",
              "_id" : "1234"
            }
          ],
          "max_query_terms" : 12,
          "min_term_freq" : 1,
          "min_doc_freq" : 5,
          "max_doc_freq" : 2147483647,
          "min_word_length" : 0,
          "max_word_length" : 0,
          "minimum_should_match" : "30%",
          "boost_terms" : 0.0,
          "include" : false,
          "fail_on_unsupported_field" : true,
          "boost" : 1.0
        }
      }
      

      Also support multiple doc id reference:

      SELECT * FROM Document WHERE /*+ES: INDEX(all_field) OPERATOR(more_like_this) */ ecm:uuid IN ('1234', '4567')
      

      Note that min_doc_freq is 5 (default elasticsearch choice) so the operation will use terms that match at least 5 documents, refer to the documentation for more information:
      https://www.elastic.co/guide/en/elasticsearch/reference/6.x/query-dsl-mlt-query.html

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 0 minutes
                  0m
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 2 hours
                  2h