-
Type: New Feature
-
Status: Resolved
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Component/s: Core
-
Release Notes Summary:Orphan blobs (binaries) are now deleted in blob stores on document deletion, document blob property edition and blob dispatched to another blob provider
-
Release Notes Description:
-
Epic Link:
-
Tags:
-
Team:PLATFORM
-
Sprint:nxplatform #79, nxplatform #80, nxplatform #81, nxplatform #82
-
Story Points:8
Today, when the default blob dispatcher moves a binary from one blob provider to another (following the blob dispatcher rules definition), it performs a copy and does not delete the source binary if it is orphaned i.e. if it is not referenced by a blob property in the backend anymore.
In the case of the retention feature, when a document becomes a record, its main blob is moved to a dedicated blob provider (hence blob store), see the recommended configuration https://doc.nuxeo.com/nxdoc/nuxeo-retention-installation-standard/#configure-via-xml-contribution:
<extension target="org.nuxeo.ecm.core.blob.DocumentBlobManager" point="configuration"> <blobdispatcher> <class>org.nuxeo.ecm.core.blob.DefaultBlobDispatcher</class> <property name="records">records</property> <property name="default">default</property> </blobdispatcher> </extension>
since the blob dispatcher performs a copy from the default provider to the record one and does not delete the source blob in the default bucket if unreferenced (orphaned), the main blob is duplicated in both associated blob stores resulting in a storage cost rise. Unless a full orphaned binaries GC is performed which is costly.
This is because the default blob store has a default digest key strategy and the stored blobs are potentially referenced by others documents. It will be too heavy to scan the repository to check if it can be deleted synchronously.
To improve the current state, we'd like to be able to add to a stream as records the keys of the blobs that are candidates for deletion.
A computation consuming this stream will be in charge of querying the database to check that the blob key is not referenced by another document's blob field before proceeding to its removal.
The implementation could leverage the domain event feature.
Note that such an improvement could be leveraged when:
- a document is removed: all the blob keys held in the document blob fields could be added to this stream in order to clean up its binaries.
- a blob field value of a document is edited, the old blob key if any could be added to this stream in order to check if the associated blob could be deleted
- causes
-
NXP-32061 Fix incremental Blob GC when async digest is enabled
- Resolved
-
NXP-31743 Fix random failure in ColdStorage unit tests
- Resolved
-
NXP-31833 Prevent GC with a Cross Repository shared blob provider configuration
- Resolved
- depends on
-
NXP-29516 Allow efficient search by blob key
- Resolved
- is duplicated by
-
NXP-28679 Improve blob GC to do cleanup for selected blobs
- Resolved
-
NXP-30197 Implement immediate blob delete
- Resolved
-
NXP-28523 Make it possible to delete the binary as soon as the associated document(s) are permanently deleted
- Resolved
- is related to
-
NXP-32308 Fix Garbage Collection when default blob provider blob keys can be both un/prefixed
- Resolved
-
NXP-30070 Migration to new denormalized ecm:blobKeys
- Resolved
-
NXP-31892 Add unit test for Immediate GC on document with Cold Storage content
- Resolved
-
NXP-28565 Make orphan binaries GC scalable
- Resolved
-
NXP-31714 Improve CoreFeature to wait for the end of document's blob GC
- Resolved
-
NXP-31794 Add an nuxeo.conf property to disable Immediate blob Garbage Collection
- Resolved
-
NXP-31964 Orphan Version should be removed as it goes along (Versions minorGC)
- Resolved
-
NXP-31730 Ensure file existence when using batch upload in cluster mode and Cloud storage
- Resolved