Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-14292

Provide an audit service based on Elasticsearch

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 5.9.3
    • Fix Version/s: 6.0
    • Component/s: Elasticsearch

      Description

      Current Audit Service implementation known limitations

      The current Audit Service implementation is based on Hibernate ORM and it has several limitations :

      • adding or removing fields directly to the entity object is not possible anymore
        • we used to have this feature but it was dropped because it created deployment issues
      • the ExtendedInfo structure is here to allow adding custom fields via EL, but :
        • storage and retrieval is very slow
        • query is not efficient
      • global performance is not good
        • massive write are a problem
        • queries are slow when there are a lot of records
      • it implies a dependency on an old version of hibernate

      Basically, the current Audit Service is not flexible enough and not fast enough.

      Elasticsearch and Audit

      from Hibernate to ES

      *Compatibility and API*

      We have to take into account the requirements for compatibility : we don't want to break everything.

      This means that :

      • we probably want to keep the current `ExtendedInfo` structure
        • even if this is does not work well with Hibernate, this should work without issue in ES as long as we use a single JSON object
      • we should keep the Service interfaces and simply implement the `nativeQuery` methods using ES DSL

      *migrating data*

      We will need to write a migration system so that people can migrate from Hibernate to ES without loosing data.

      *About PageProviders*

      The current Audit PageProvider use the nativeQuery API of the AuditReader service : this means the PageProvider won't be automatically migrated from Hibernate to ES since we probably don't want to use EJBQL to query Audit entries stored in elasticsearch.

      However, as long as we provide a new definition for the default providers this should be ok.

      *Tx behavior*

      Of course ES backend won't be transactional : this should not be an issue since most of the usage of `AuditWriter` is via an async listener.

      However, we have some synchronous access (like the one done by the AuthenticationFilter or the Automation API) : we can probably use the "near realtime" API of ES for that.

      ES index content

      We probably want to have a dedicated index for Audit in addition of the one we created for Documents.

      However, it may be useful to be able to query ES on both Document and Audit data.

      From my understanding, we can not do joins in ES, however, we could store the Document itself as a nested object inside the Audit entry.

      In addition of making complex queries possible, if will open the way for a complete document history storage (it has been asked several time).

      Relation between Audit and Document ES indexes

      I don't see any reasons for wanting several ES clusters and configuration.

      So, I would say that the existing ES service should allow to :

      • configure an index for Audit entries
      • deploy a ES backed implementation of Audit Services

      Expected gains

      Having a ES based Audit service should give us :

      • a *good performance* for querying Audit
      • a nice *reporting* option via tools like Kibana
      • a better *HA solution* (ES is natively HA)
      • a support for full *document History*
      • the ES *native PageProvider* can make sense globally, not only for Audit

      Implementation steps and timeline

      In terms of timeline, if we can not implement all that before 5.9.6 : so it can not be part of 6.0.

      However, we could prepare the work so that we can have a `nuxeo-audit-elasticsearch` plugin that can be built as an addon for 6.0.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                1 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 1 week, 2 days
                  1w 2d
                  Remaining:
                  Remaining Estimate - 1 week, 2 days
                  1w 2d
                  Logged:
                  Time Spent - Not Specified
                  Not Specified