[NXP-23076] define target deployment architecture: Multi-AZ vs Multi-region - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Task
Status: In Progress
Priority: Minor
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: ADDONS_9.10, ADDONS_10.10
Component/s: Clustering

Epic Link:
Architecture blueprint for HA&DR
Backlog priority:
1,000

Description

Multi-Datacenters : Multi-AZ vs Multi-Regions

When talking about HA, a lot of people talk about using multiple data centers.

However, it is important to acknowledge the fact that multi-datacenters by itself is not precise enough.
The main impact on deployment architecture is tied to the way the data centers are connected:

datacenters are connected via a high speed and low latency network
- this is a Multi-Availability Zone deployment
- we can spread the architecture across the data centers
datacenters are using a WAN (or a network with significant latency)
- this is a Multi-Region deployment
- we need to deploy 2 copies of the architecture and provide asynchronous data replication

Multi-AZ Deployment

Principles

The main goal of this type of deployment is High Availability: be sure that if one zone goes down the service will continue to run.

Since AZ usually have different power sources, Internet access and cooling systems, deploying across multiple AZ is good for HA and can also provide some DRP options.

However, since, by definition, AZ are geographically co-located, it does not really protect in case of disaster (earthquake, global power grid being down ...): so may not be seen as a complete DRP solution.

Constraints

The main constraint is associated to the network between the Data Centers:

reliable
high speed (Gb/s)
low latency (< 1 ms)

For HA architecture we rely on HA services, fault tolerant service requires an odd number of nodes to reach a consensus necessary for leader election, the minimal number of nodes being 3.

To support a zone outage it requires 3 zones, having only 2 zones means that there are more nodes on one side, when this side goes down the other size can not be elected as the leader because there is no majority, so the service is not available.

Deployment Architecture

The goal is to have an Active / Active / Active deployment.

Multi-Regions deployment

Principles

When deploying Nuxeo across multiple regions the goal should be disaster recovery: be sure that the company / application can continue working even if a regions goes down completely.

When talking about DRP, there are several metrics that impact the target architecture:

RTO: Recovery Time Objective
- maximum time before the system can be up again
- this is basically the maximum down time
RPO: Recovery Point Objective
- maximum amount of data that can be lost

Obviously is RTO/RPO are large a simple externalized backup can be a solution.

However, when RTO is below 1h and RPO is a few minutes, we need a dedicated architecture.

Limitations: not HA

DRP is about replicating the data between data centers on different regions, because this replication can not be synchronous (network latency) one site is always "behind" the other meaning that it can not be used for serving user requests.

Another way to say that is: because this replication is asynchronous and there is network latency there is a window of data loss in case of failover.

Also this latency requires to duplicate services because cluster can not be stretched between regions
One other goal for using multi-regions may be to optimize geographical delivery, but for that Nuxeo approach will be more CDN and upload accelerators and this is not the architecture discussed below.

Constraints

Since HA is usually also needed, we need to have a mixed architecture with:

3 AZs deployment
1 DRP deployment

Deployment Architecture

Focus for this Epic

We will evaluate both the Multi-AZ (HA) failover scenario as well as the Multi-region (DR, with high latency) failover scenario during our testing.

Attachments

Activity

People

Assignee:

Thierry Delprat

Reporter:

Thierry Delprat

Participants:

Thierry Delprat

Votes:

0 Vote for this issue

Watchers:

4 Start watching this issue

Dates

Created:

2017-09-13 20:58

Updated:

2019-01-11 10:44