Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-11521

Add a page provider option to get fast sorting

    XMLWordPrintable

    Details

      Description

      To get good performance to retrieve a page of long sorted result list, the database should not do a full scan + sort to get the expected slice of results. Simply because it takes seconds to perform when you need 100k results among 1m of documents.

      The only way to get very fast response is to force the planner to use an index on the sorted column.

      Here is the most simple case:

      SELECT id FROM dublincore ORDER BY title LIMIT 10 OFFSET 0;
      

      translated into Oracle:

      SELECT C1, RNUM  FROM (
        SELECT /*+ FIRST_ROWS(10) */ a.C1, ROWNUM rnum FROM (
         SELECT ID AS C1 FROM DUBLINCORE ORDER BY TITLE) a 
        WHERE ROWNUM <= 10)
      WHERE rnum > 0
      

      The CBO (Oracle planner) will use an existing index on the TITLE column only if there are no NULL value.

      This means that either there is a constraint on the table to say that title IS NOT NULL, either we add an explicit clause "title IS NOT NULL" to the query.

      PostgreSQL is smart enough to not requires such constraint.

      LEFT Join

      Even with a "NOT NULL" constraint at the table level, if we have a LEFT JOIN, Oracle will not use the index because it is possible that the join brings NULL value for the sort column.

      This is the case here:

      SELECT C1, RNUM  FROM (
        SELECT   /*+ FIRST_ROWS(100) */ a.C1, ROWNUM rnum FROM (
          SELECT HIERARCHY.ID AS C1 FROM HIERARCHY
          LEFT JOIN DUBLINCORE F2 ON HIERARCHY.ID = F2.ID WHERE (HIERARCHY.PRIMARYTYPE = 'File')
          ORDER BY F2.TITLE) a 
        WHERE ROWNUM <= 100)
      WHERE rnum > 0;
      

      Using a default "INNER JOIN" instead of the "LEFT JOIN" solves the problem, but this is not always possible.

      Adding an explicit clause "AND F2.TITLE IS NOT NULL" do work and the planner choose to use the index.

      UNION ALL

      There is no way (AFAIK) to use an efficient index sort when using a UNION ALL query. Because the order is done after doing the union of 2 unsorted sub query.

      So if you need to search on document and proxies it will not work. Fast paging requires to add an explicit NXQL clause "AND ecm:isproxy=0".

      Clauses selectivity

      The planner will choose the sort index only if other clauses are not enough selective. If there are some selective clauses it is better to do the sort at the end.

      Note that the security may be a selective clause.

      Client language

      Finally Oracle will not use the index for sorting if the client (JDBC) does not speak the same language as the database, because the sort depends on the locale. The JDBC locale is taken from the JVM user.language and user.country.

      So for a default english database you must set explicitly the following java options:

      JAVA_OPTS=$JAVA_OPTS -Duser.language=en -Duser.country=US 
      

      Conclusion

      The db will use super fast plan for paging long results only if:

      1. you have an extra clause "AND column_use_to_sort IS NOT NULL"
      2. you don't use proxies: "AND ecm:isproxy=0"
      3. the other clauses or security are not selective
      4. For Oracle the JVM user.language and user.country match the database

      When using page provider the results can be ordered using different column and the point 1. must be handled dynamically. The idea is to add an explicit option (optimized_sort) at the page provider level.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: