Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-5689

VCS: store PostgreSQL fulltext in clear text

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 5.3.2
    • Fix Version/s: 5.5
    • Component/s: Core VCS
    • Environment:
      PostgreSQL
    • Tags:
    • Impact type:
      Content model Change
    • Upgrade notes:
      Hide

      When migrating a database from the previous model for the fulltext table to the new one, there are several steps to take.

      1. Remove the old trigger

      DROP TRIGGER IF EXISTS nx_trig_ft_update ON fulltext;

      2. Remove the old indexes

      SELECT relname FROM pg_class WHERE oid IN
      (SELECT indexrelid FROM pg_index JOIN pg_class ON pg_class.oid = pg_index.indrelid
      WHERE pg_class.relname = 'fulltext' AND NOT indisunique AND NOT indisprimary);

      This returns a list like:

      fulltext_fulltext_description_idx
      fulltext_fulltext_idx
      fulltext_fulltext_title_idx

      You must then drop these indexes:

      DROP INDEX IF EXISTS fulltext_fulltext_idx;
      DROP INDEX IF EXISTS fulltext_fulltext_title_idx;
      DROP INDEX IF EXISTS fulltext_fulltext_description_idx;

      3. Identify the columns to be upgraded

      SELECT column_name FROM information_schema.columns WHERE table_schema = 'public' AND table_name = 'fulltext' AND data_type = 'tsvector';

      This returns a list like:

      fulltext
      simpletext
      binarytext
      fulltext_title
      simpletext_title
      binarytext_title
      fulltext_description
      simpletext_description
      binarytext_description

      Make sure you have defined the "tsvector_to_text" function from the attached file tsvector_to_text.sql.txt

      You must then upgrade each column. THIS MAY TAKE A LONG TIME (around 10 rows per second on one test machine for each ALTER):

      ALTER TABLE fulltext ALTER simpletext TYPE text USING tsvector_to_text(simpletext);
      ALTER TABLE fulltext ALTER binarytext TYPE text USING tsvector_to_text(binarytext);
      ALTER TABLE fulltext ALTER fulltext TYPE text USING tsvector_to_text(fulltext);
      ALTER TABLE fulltext ALTER simpletext_title TYPE text USING tsvector_to_text(simpletext_title);
      ALTER TABLE fulltext ALTER binarytext_title TYPE text USING tsvector_to_text(binarytext_title);
      ALTER TABLE fulltext ALTER fulltext_title TYPE text USING tsvector_to_text(fulltext_title);
      ALTER TABLE fulltext ALTER simpletext_description TYPE text USING tsvector_to_text(simpletext_description);
      ALTER TABLE fulltext ALTER binarytext_description TYPE text USING tsvector_to_text(binarytext_description);
      ALTER TABLE fulltext ALTER fulltext_description TYPE text USING tsvector_to_text(fulltext_description);

      4. Recreate the indexes on all the "fulltext*" columns. THIS WILL ALSO TAKE SOME TIME:

      CREATE INDEX fulltext_fulltext_idx ON fulltext USING GIN(NX_TO_TSVECTOR(fulltext));
      CREATE INDEX fulltext_fulltext_title_idx ON fulltext USING GIN(NX_TO_TSVECTOR(fulltext_title));
      CREATE INDEX fulltext_fulltext_description_idx ON fulltext USING GIN(NX_TO_TSVECTOR(fulltext_description));

      Show
      When migrating a database from the previous model for the fulltext table to the new one, there are several steps to take. 1. Remove the old trigger DROP TRIGGER IF EXISTS nx_trig_ft_update ON fulltext; 2. Remove the old indexes SELECT relname FROM pg_class WHERE oid IN (SELECT indexrelid FROM pg_index JOIN pg_class ON pg_class.oid = pg_index.indrelid WHERE pg_class.relname = 'fulltext' AND NOT indisunique AND NOT indisprimary); This returns a list like: fulltext_fulltext_description_idx fulltext_fulltext_idx fulltext_fulltext_title_idx You must then drop these indexes: DROP INDEX IF EXISTS fulltext_fulltext_idx; DROP INDEX IF EXISTS fulltext_fulltext_title_idx; DROP INDEX IF EXISTS fulltext_fulltext_description_idx; 3. Identify the columns to be upgraded SELECT column_name FROM information_schema.columns WHERE table_schema = 'public' AND table_name = 'fulltext' AND data_type = 'tsvector'; This returns a list like: fulltext simpletext binarytext fulltext_title simpletext_title binarytext_title fulltext_description simpletext_description binarytext_description Make sure you have defined the "tsvector_to_text" function from the attached file tsvector_to_text.sql.txt You must then upgrade each column. THIS MAY TAKE A LONG TIME (around 10 rows per second on one test machine for each ALTER): ALTER TABLE fulltext ALTER simpletext TYPE text USING tsvector_to_text(simpletext); ALTER TABLE fulltext ALTER binarytext TYPE text USING tsvector_to_text(binarytext); ALTER TABLE fulltext ALTER fulltext TYPE text USING tsvector_to_text(fulltext); ALTER TABLE fulltext ALTER simpletext_title TYPE text USING tsvector_to_text(simpletext_title); ALTER TABLE fulltext ALTER binarytext_title TYPE text USING tsvector_to_text(binarytext_title); ALTER TABLE fulltext ALTER fulltext_title TYPE text USING tsvector_to_text(fulltext_title); ALTER TABLE fulltext ALTER simpletext_description TYPE text USING tsvector_to_text(simpletext_description); ALTER TABLE fulltext ALTER binarytext_description TYPE text USING tsvector_to_text(binarytext_description); ALTER TABLE fulltext ALTER fulltext_description TYPE text USING tsvector_to_text(fulltext_description); 4. Recreate the indexes on all the "fulltext*" columns. THIS WILL ALSO TAKE SOME TIME: CREATE INDEX fulltext_fulltext_idx ON fulltext USING GIN(NX_TO_TSVECTOR(fulltext)); CREATE INDEX fulltext_fulltext_title_idx ON fulltext USING GIN(NX_TO_TSVECTOR(fulltext_title)); CREATE INDEX fulltext_fulltext_description_idx ON fulltext USING GIN(NX_TO_TSVECTOR(fulltext_description));

      Description

      We must store the fulltext columns (simpletext, binarytext, fulltext) in clear text instead of in a tsvector.
      This is needed to:

      • allow LIKE-based matches on the text (NXP-5450),
      • allow reindexing with a different analyzer without re-extracting all the text from the documents.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                fguillaume Florent Guillaume
                Reporter:
                fguillaume Florent Guillaume
                Participants:
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: