Some files are uploaded to Nuxeo using "direct upload" and therefore their content is never seen by Nuxeo, which makes it impossible to synchronously compute their digest and use this digest as the blob key. Having the blob key be a digest is useful for:
- compliance with customer rules that require keys to be digests.
To fix this we will introduce a process to asynchronously compute the digest of each new blob (after downloading it) and "renaming" the blob key. This renaming will involve moving the blob in the blob provider, and finding all documents that have this blob key (thanks to
NXP-29516) to change them to use the new key.
- Log in a stream the keys of blobs that are copied (S3->S3) from a non-digest blob provider to one requiring digests.
- Have an asynchronous computation for this stream to:
- Download (streaming) the blob and compute its actual digest.
- In the blob provider, copy the blob to a new one with the actual digest as key.
- Do a cluster-wide notification that this new key should be used instead of the old one if during the following steps new blobs are created in parallel in the repository.
- In the repository
- find all blobs having the old key,
- replace the old key with the new key,
- optionally store the digest the high-level digest field.
- In the blob provider, remove the old blob.
- Notify the cluster that work on this blob key is finished.