-
Type: New Feature
-
Status: Resolved
-
Priority: Minor
-
Resolution: Fixed
-
Affects Version/s: 5.9.3
-
Fix Version/s: 6.0
-
Component/s: Elasticsearch
-
Tags:
Current Audit Service implementation known limitations
The current Audit Service implementation is based on Hibernate ORM and it has several limitations :
- adding or removing fields directly to the entity object is not possible anymore
- we used to have this feature but it was dropped because it created deployment issues
- the ExtendedInfo structure is here to allow adding custom fields via EL, but :
- storage and retrieval is very slow
- query is not efficient
- global performance is not good
- massive write are a problem
- queries are slow when there are a lot of records
- it implies a dependency on an old version of hibernate
Basically, the current Audit Service is not flexible enough and not fast enough.
Elasticsearch and Audit
from Hibernate to ES
*Compatibility and API*
We have to take into account the requirements for compatibility : we don't want to break everything.
This means that :
- we probably want to keep the current `ExtendedInfo` structure
- even if this is does not work well with Hibernate, this should work without issue in ES as long as we use a single JSON object
- we should keep the Service interfaces and simply implement the `nativeQuery` methods using ES DSL
*migrating data*
We will need to write a migration system so that people can migrate from Hibernate to ES without loosing data.
*About PageProviders*
The current Audit PageProvider use the nativeQuery API of the AuditReader service : this means the PageProvider won't be automatically migrated from Hibernate to ES since we probably don't want to use EJBQL to query Audit entries stored in elasticsearch.
However, as long as we provide a new definition for the default providers this should be ok.
*Tx behavior*
Of course ES backend won't be transactional : this should not be an issue since most of the usage of `AuditWriter` is via an async listener.
However, we have some synchronous access (like the one done by the AuthenticationFilter or the Automation API) : we can probably use the "near realtime" API of ES for that.
ES index content
We probably want to have a dedicated index for Audit in addition of the one we created for Documents.
However, it may be useful to be able to query ES on both Document and Audit data.
From my understanding, we can not do joins in ES, however, we could store the Document itself as a nested object inside the Audit entry.
In addition of making complex queries possible, if will open the way for a complete document history storage (it has been asked several time).
Relation between Audit and Document ES indexes
I don't see any reasons for wanting several ES clusters and configuration.
So, I would say that the existing ES service should allow to :
- configure an index for Audit entries
- deploy a ES backed implementation of Audit Services
Expected gains
Having a ES based Audit service should give us :
- a *good performance* for querying Audit
- a nice *reporting* option via tools like Kibana
- a better *HA solution* (ES is natively HA)
- a support for full *document History*
- the ES *native PageProvider* can make sense globally, not only for Audit
Implementation steps and timeline
In terms of timeline, if we can not implement all that before 5.9.6 : so it can not be part of 6.0.
However, we could prepare the work so that we can have a `nuxeo-audit-elasticsearch` plugin that can be built as an addon for 6.0.