Nuxeo Core should be able to manage blobs that are physically stored in a external filesystem.
At low level, this means we have to store a simple URL (or path).
From API, blobs stored in Nuxeo and blob stored externaly should behave in the same way for reading.
For update, we probably should have a ExternalBlob object.
Use case
--------
It's possible to reference blobs using a new field nxs:externalContent,
blobs will not be stored in the repository, but on the file system.
API is transparent for this kind of document (getPropertyValue retrieves a
blob), and it does not impact indexing (blob content will be indexed too).
It should work in complex properties too (list of external blobs, complex
prop or list of complex props holding an external blob).
Technical Architecture
----------------------
The string stored on the property kept in the repo is a uri that looks like:
protocol://namespace/for/this/field/localname
for instance we could have:
protocol://nuxeo/externalBlob/myfiles/myfile.jpg.
The namespace (here http://namespace/for/this/field/) is registered to the
type service as the namespace for the field used on a given schema
property. This can be useful to get the logic to retrieve the file.
For instance, a namespace can be tied to a filesystem retrieval mechanism,
where the folder containing the data is configurable. The rest of the URI
(here localname) can be in this case the relative path to the file under the
container folder.
For instance we would have something like:
<extension target="org.nuxeo.ecm.core.schema.TypeService"
point="externalContent">
<property name="myschema:myExternalStringProperty"
namespace="http://nuxeo/fileSystem/externalBlob/fileSystem" />
<adapter namespace="http://nuxeo/fileSystem/externalBlob/fileSystem"
class="org.nuxeo.ecm.core.adapters.FileSystemExternalBlobAdapter">
<property name="container">/path/to/container/folder/</property>
</adapter>
<extension>
The FileSystemExternalBlobAdapter class will follow an interface with the
following API (ExternalBlobAdapter):
- String getNamespace();
- void setNamespace(String namespace);
- String getUri(Serializable object, Map<String, Serializable> context)
throws PropertyException;
- Blob getBlob(String uri, Map<String, Serializable> context)
throws PropertyException;
In this class case:
- the method getBlob should retrieve the file placed at
/path/to/container/folder/relative/path/to/file, with path/to/file parsed
from uri (keep the end of the uri as relative path). It should throw an
exception if no file is found at this path.
- the method getString should build the uri for a java.io.File passed as
argument. If the object is not a java.io.File object, throw an
exception. If the file absolute path does not start with the container
folder path, throw an exception. Else build the uri with namespace +
relative path.
The adapter classes will be queried from the document properties api to
retrieve a blob. Set apis do not have to be implemented for now (no set).
Add on SchemaManager the following api:
- ExternalBlobAdapter getExternalBlobAdapter(String fieldName)
It should lookup the configuration to find the appropriate adapter.
This API will be called from document properties. Grepping code using
TypeConstants.CONTENT is a good start to take example on it and adapt blob
mpanagement to the external blob.
+ need to check the blob extractor behaviour for indexing (it may need to be
adapted to work correctly in this case).
Problems
--------
- Do we have specific problems to deal with in case of multi-machine
environment? no, it's coded on CoreSession side => it'll be on the core
machine.
- Do we harcode the field adapter mechanism like for nxs:content? => yes
- Edit use case is not needed and more complicated (do we delete the old
file, do we update the path or just update the content/or the content of
the field, what happens on concurrent update of the same file) => just
forget about it for now.