[NXP-28917] Explicit Benchmark approach - Nuxeo Issue Tracker

Details

Type: Question
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: Performance

Epic Link:
10B Benchmark

Description

Lessons learned

We have done benchmarks in the past in order to validate that our principles for scalability were working:

we did a benchmark to verify that Application Level Sharding allows handling more documents
- 1B docs with 10 PGSQL based repositories
we did a benchmark to verify that Infrastructure Level Sharding allows handling more documents
- 1B docs with 1 repository but Sharded MongoDB cluster
we did a benchmark to verify that we can isolate workloads
- 1B docs with DAM use cases

This gives us confidence in our ability to scale Nuxeo and isolate the workloads to be able to deliver the right performance depending on the actual use cases.

However, what we have not identified yet are the different scalability steps:

when do we need to start sharding at the infrastructure level
when do we need to start sharding at the application level
how far can we go

This is the very goal of the current benchmark campaign.

Context matters

Because Nuxeo is a very configurable Platform the type of application and the business use cases have a very significant impact on performance and scalability aspects.

We want this benchmark to be realistic of a use case where 10B documents can be used.

The initial idea is to simulate a "bank-like use case" and store Bank Statements for a large number of customers.

For that, we need to define a Studio project that will define the content model.

Statements will be generated as PDF because this is the only practical solution we have at scale(see NXP-28765)

More info related to the content model, the filing plan and the tests:

Benchmarking steps

We want to go through all the "scalability" in a logical order:

step 1: *non-sharded environment*
- optimize to leverage as much as possible the hardware
step 2: *sharding at the infrastructure level*
- leverage MongoDB and Elastic sharding capabilities
- see how far we can go without having a cluster that becomes hard to manage
step 3: *sharding at application level*
- leverage Nuxeo multi-repositories feature
- leverage multiple MongoDB or Elasticsearch clusters

The idea is to progressively grow the volume of documents and move from step 1 to step 3 as needed.

100M documents
0.5B documents
2B documents
5B documents
10B documents
...

Optimizing for non-sharded environment

Sharding is a way to handle scalability, but it comes with a price in terms of hardware resources and management complexity. So, we want to delay as much as possible the need to move from step 1 to the next steps.
For that, we want to leverage as much as possible the hardware so that we can go as far as possible with step 1.

Indexes size

When targetting billions of documents inside the repository one of the obvious limiting factors comes from the capacity of the database to handle the queries in a timely manner.

To mitigate that aspect we can:

offload more queries to Elasticsearch
add more indexes inside MongoDB

Indexes are key to have good performance but this is only true if the database can maintain a significant part of the indexes in memory. If the database needs to always read the indexes from the hard drive, unless we are using expensive NVMe storage it will be slow.

This means that one way to improve database performance for Nuxeo is to minimize the indexes size.
Documents are identified and linked together using their UUID, so it is useful to indexes UUID and fields referencing UUIDs (like parentId or links).
As a result, providing a way to reduce how we encode and index Nuxeo UUIDs is a way to increase the number of documents we can handle on given hardware.

Hence the work on NXP-28763.

Fulltext storage

When full-text indexing is activated, by default, Nuxeo stores the text content extracted from the blob inside the database.
The goal is to allow to rebuild the full-text index without having to go through the full-text extraction process for all the blobs.

This approach has the disadvantage that it makes the DB storage grow faster with the number of documents.

We still believe that storing the full-text is a good trade-off, but we are working on other storage options in order to diminish the pressure on the DB.

The associated task is NXP-26704

Thumbnails and conversions

The default policy inside Nuxeo is to compute and store all conversions:

thumbnails are automatically generated and stored
pictures and video renditions are automatically generated and stored

In the context of repositories with a few million documents, this seems to be a sensible policy.

However, when reaching several billion, we may want to reconsider the trade-off:

is this worth paying the additional storage if only a small percentage of these renditions will ever be accessed?
is on-the-fly generation not a better alternative?

Optimizing Sharding

Transparent Sharding vs Functional Sharding

The default approach is to use the Nuxeo Document UUID as a sharding key.
This approach has several benefits:

sharding is completely transparent
shards are statistically well balanced as long as UUIDs generation is well distributed

However, this approach also comes with some limitations:

queries cost tends to grow with the number of shards
- sent the query to all nodes + merge and sort results
adding a new shard requires a cluster-wide re-balancing

Custom Sharding key vs Application-level sharding

Inside MongoDB do not need to rely on the doc UUID to define sharding, we could also use a custom meta-data.

However, for this meta-data to be usable as a sharding key:

it needs to be present on every single document
values need to be properly balanced

If we identify such a partitioning scheme at the functional level, it is probably much more efficient to switch to multi-repositories:

each "logical shard" can be managed and configured separately
- storage speed, DB types, hardware resources
indexing and storage are sharded using the same axis

Impact on the benchmark steps

As a result, the scale-out steps will be:

single repository - no sharding
single repository - one MongoDB collection sharded on UUID
multi-repositories - multiple MongoDB collections sharded on UUID (on multiple MongoDB Cluster)

Validating performances

Goal

For each step in the benchmark, we want to be able to check the level of performance so that we can determine if we can add more documents on the same infrastructure or if we need to scale out first.

REST and CRUD

For very basic REST API calls that perform CRUD operation, we can learn from the previous benchmarks, and what we see is that:

the database is the main bottleneck
- some databases more than others
having more than one Nuxeo node is useful
- because of HTTP threads-pool
- because otherwise, we do not test the cluster invalidation
but after 2 nodes, the impact of adding Nuxeo nodes is small (if not null)

So, in the context of this benchmark:

running simple REST CRUD test is meaningful for testing throughput and response time
we do not need more than 2 Nuxeo nodes

Streaming and Synchronous processing

When we have a very large repository a lot of operations can result in a massive amount of asynchronous processing:

re-indexing
bulk-update on a query
ACLs update

Kafka as the target benchmark configuration

Theoretically, we could choose either Redis or Kafka as the backend for the asynchronous processing, however, clearly, Kafka is the solution we want to test:

we know Kafka will not be the bottleneck
- Kafka can handle much bigger throughput than what we need
- Redis is limited by memory and memory is expensive
benchmarks show us that with Kafka we better handle the back-pressure
- this even impacts positively the CRUD test

Workload isolation

The previous benchmarks, especially the DAM one, have demonstrated clearly that we can easily scale out asynchronous processing by adding Nuxeo nodes.

The DAM benchmark also showed that we can handle priorities between the different streams of work.

In the context of this benchmark, we want to validate a "simple configuration" so that we can be sure that "bulk and import" activity will not impact negatively interactive processing.

Explicit Benchmark approach