Uploaded image for project: 'Nuxeo Drive '
  1. Nuxeo Drive
  2. NXDRIVE-2140

Handle documents with non-standard or empty digest

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: NoFixVersionApplicable
    • Fix Version/s: 5.0.0
    • Component/s: Synchronizer
    • Release Notes Summary:
      Handle documents with non-standard digest
    • Release Notes Description:
      Hide

      When using the S3 upload provider, digest of documents can be non-standard, i.e. not MD5, SHA256, etc. In that case, Drive is not able to download a file from the server because it cannot verifiy its integrity.
      The situation is worse when the S3 encryption is used (SSE-KMS): everytime a change is done on the server, Drive will recieve a non-standard digest different for each action, even when the file itself was not modified.

      Before Drive 5.0.0, such documents were simply skipped: there was no way to synchronize them.

      A huge work was done on the server to asynchonously compute real digest of those documents.
      When one creates a file on the server, the document is created and the file attached to it with the non-standard digest set by S3. Then a worker on the server will compute the digest and update the document digest accordingly.

      With Drive 5.0.0, when a document with a non-standard digest will be recieved, it will be saved in the database, just like any other documents. Except that it will not be synchronized, not yet.
      Later, when the server will have finished to compute the digest, Drive will recieve a new event with the up-to-date and correct digest. At that moment, Drive will be able to synchronize the document.

      Note:
      There are other documents with exotic digests such as Live Connect ones. In that case, Drive will keep documents in the local database but will never synchronize them. This is an open door for future versions of Drive where we might want to handle those documents differently.

      Show
      When using the S3 upload provider, digest of documents can be non-standard, i.e. not MD5, SHA256, etc. In that case, Drive is not able to download a file from the server because it cannot verifiy its integrity. The situation is worse when the S3 encryption is used (SSE-KMS): everytime a change is done on the server, Drive will recieve a non-standard digest different for each action, even when the file itself was not modified. Before Drive 5.0.0, such documents were simply skipped: there was no way to synchronize them. A huge work was done on the server to asynchonously compute real digest of those documents. When one creates a file on the server, the document is created and the file attached to it with the non-standard digest set by S3. Then a worker on the server will compute the digest and update the document digest accordingly. With Drive 5.0.0, when a document with a non-standard digest will be recieved, it will be saved in the database, just like any other documents. Except that it will not be synchronized, not yet. Later, when the server will have finished to compute the digest, Drive will recieve a new event with the up-to-date and correct digest. At that moment, Drive will be able to synchronize the document. Note: There are other documents with exotic digests such as Live Connect ones. In that case, Drive will keep documents in the local database but will never synchronize them. This is an open door for future versions of Drive where we might want to handle those documents differently.
    • Epic Link:
    • Sprint:
      nxDrive 11.1.34, nxDrive 11.1.35, nxDrive 11.2.8, nxDrive 11.2.13
    • Story Points:
      5

      Description

      Error

      This is error is in the same style as NXDRIVE-1973.

      Currently, when a document is non-folderish, has a digest set but no digest algorithm set, we try to guess the algorithm. This may have worked sometimes for some documents. But there a documents, typically Live Connect ones, that have a digest that will make the guess wrong, for instance here is a document from the intranet:

      {
          "id": "defaultFileSystemItemFactory#default#4b61d5ed-4b04-4ce8-bd68-43ca44b3b02f",
          "parentId": "defaultSyncRootFolderItemFactory#default#024a0234-9131-486f-b387-884e4bb010b6",
          "name": "Nuxeo Cloud Presentation.pptx",
          "folder": false,
          "creator": "jallouch",
          "lastContributor": "jallouch",
          "creationDate": 1585921066534,
          "lastModificationDate": 1585921066534,
          "canRename": true,
          "canDelete": true,
          "lockInfo": null,
          "path": "/org.nuxeo.drive.service.impl.DefaultTopLevelFolderItemFactory#/defaultSyncRootFolderItemFactory#default#024a0234-9131-486f-b387-884e4bb010b6/defaultFileSystemItemFactory#default#4b61d5ed-4b04-4ce8-bd68-43ca44b3b02f",
          "userName": "gcarlin",
          "downloadURL": "nxfile/default/4b61d5ed-4b04-4ce8-bd68-43ca44b3b02f/blobholder:0/Nuxeo%20Cloud%20Presentation.pptx",
          "digestAlgorithm": null,
          "digest": "9e25dad57bbcd7b08a0adf91e701b0cc",
          "canUpdate": true
      }
      

      Here, the algorithm guess will return "MD5" as the digest is 32 hexadecimal characters. But this is not good and the document will never be synced, it will be errored because of the integrity check failures.

      Fix

      Handling those documents will be possible after NXP-30044 will be done.

      Actions:

      • Documents without valid digests are currently skipped. Search for UnknownDigest occurrences in the code.
      • Instead of skipped them, we need to add them in the database just as other documents, but with a transient state: we cannot sync them, we just need to keep a track in the database.
      • When the server will send the blobDigestUpdated event, that will tell us the digest is available and the document can be synced then.
      • Update the digest in the database and sync the document.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 1 week, 5 hours
                  1w 5h