Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-12452

Add a metadata extraction framework

    XMLWordPrintable

    Details

    • Type: Epic
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.1
    • Component/s: Image Management
    • Tags:
    • Backlog priority:
      600
    • Sprint:
      TGV 7.1-2, TGV 7.1-3, TGV 7.1-4, TGV 7.1-5

      Description

      Service

      New MetadataService with following methods (to be improved):

      • void readMetadata(DocumentModel doc)
      • void writeMetadata(DocumentModel doc)
      • Map<String, String> readMetadata(String processorName, Blob blob, List<String> metadataNames)
      • void writeMetadata(String processorName, Blob blob, Map<String, String> metadata)
      • ...

      Contributions

      Processor

      Define a named processor, a class that will read / write metadata from/to the Blob.
      Returns a Map<String, String> when reading, write a Map<String, String>.

      • Map<String, String> readMetadata(Blob blob)
      • void writeMetadata(Blob blob, Map<String, String> metadata)

      A processor using ExifTool (will be the default one):

      <extension target="org.nuxeo.metadata.MetadataService"
        point="processors">
        <processor id="exifTool" class="org.nuxeo.metadata.ExifToolProcessor" />
      </extension>
      

      Think about adding custom events to send to respond future needs of custom listener (like image magick task or else in relation to the metadata extraction)

      MetadataMapping

      Define a mapping between doc properties and metadata properties.

      <extension target="org.nuxeo.metadata.MetadataService"
        point="metadataMappings">
        <metadataMapping id="xmp" processor="exifTool" blobXPath="file:content">
           <metadata name="tiff:ImageWidth" xpath="xmp:ImageWidth" />
           <metadata name="tiff:ImageLength" xpath="xmp:ImageLength"  />
           <metadata name="xmp:CreatorTool" xpath="xmp:CreatorTool"  />
        </metadataMapping>
      </extension>
      

      If blobXPath is empty, use BlobHolder.getBlob() to get the Blob.

      Should we add a "policy" on each property to specify if read only, write or readwrite?
      For now, I would ignore it and see later if we want to add it.

      We should think about "merge", "override" and "append".

      Rules

      <extension target="org.nuxeo.metadata.MetadataService"
        point="rules">
        <rule id="default" order="0" enabled="true" async="true|false">
          <metadataMappings>
            <metadataMapping-id>xmp</metadataMapping-id>
            ...
          </metadataMappings>
          <filters>
            <filter-id>hasXMPFacet</filter-id>
          </filters>
        </rule>
      </extension>
      

      Rules apply based on the filters. Each filter is executed with the following variable in the action context:

      • the document, if any
      • the Blob

      Default listener

      Sync listener listening on creation/ modification of documents.
      Trigger the MetadataService to:

      • on doc creation, read the metadata and update the document (iterate on all rules) if the Blob is not empty.
      • on doc modification, for each metadata mapping on each rule:
        • if Blob dirty and document metadata not dirty, read metadata from Blob to doc
        • if Blob dirty and document metadata dirty, write metadata from doc to Blob
        • if Blob not dirty and document metadata dirty, write metadata from doc to Blob

      When iterating on rules, according to their "async" attribute, either call the processor to read/write, or put in the event context the dirty properties to process and then launch an event for an async listener that will process them.

      Operations

      At least

      • Trigger a contributed metadataMapping based on its name on a document
      • Write metadata to a Blob (xpath parameter, or BlobHolder if empty) from a document (input) given a custom metadata mapping defined in a Properties parameter (xpath=metadataName), using a named processor (exifTool for instance).
      • Write metadata to a Blob (input) from a custom metadata mapping defined in a Properties parameter (metadataName=value), using a named processor (exifTool for instance).
      • Read metadata from a Blob (xpath parameter, or BlobHolder if empty) to a document (input) given a custommetadata mapping defined in a Properties parameter (xpath=metadataName), using a named processor (exifTool for instance)
      • Read metadata from a Blob (input) given a custom list of metadata defined (or optional, to get all metadata in result of ExifTool) in a StringList parameter (metadataName1, metadataName2, ...), using a named processor (exifTool for instance), and put the result (a Map) in the Context.

      => Probably need to review / add new methods on the Service to handle that.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                1 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 1 week
                  1w
                  Remaining:
                  Remaining Estimate - 1 week
                  1w
                  Logged:
                  Time Spent - Not Specified
                  Not Specified