Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-23076

define target deployment architecture: Multi-AZ vs Multi-region

    XMLWordPrintable

    Details

      Description

      Multi-Datacenters : Multi-AZ vs Multi-Regions

      When talking about HA, a lot of people talk about using multiple data centers.

      However, it is important to acknowledge the fact that multi-datacenters by itself is not precise enough.
      The main impact on deployment architecture is tied to the way the data centers are connected:

      • datacenters are connected via a high speed and low latency network
        • this is a Multi-Availability Zone deployment
        • we can spread the architecture across the data centers
      • datacenters are using a WAN (or a network with significant latency)
        • this is a Multi-Region deployment
        • we need to deploy 2 copies of the architecture and provide asynchronous data replication

      Multi-AZ Deployment

      Principles

      The main goal of this type of deployment is High Availability: be sure that if one zone goes down the service will continue to run.

      Since AZ usually have different power sources, Internet access and cooling systems, deploying across multiple AZ is good for HA and can also provide some DRP options.

      However, since, by definition, AZ are geographically co-located, it does not really protect in case of disaster (earthquake, global power grid being down ...): so may not be seen as a complete DRP solution.

      Constraints

      The main constraint is associated to the network between the Data Centers:

      • reliable
      • high speed (Gb/s)
      • low latency (< 1 ms)

      For HA architecture we rely on HA services, fault tolerant service requires an odd number of nodes to reach a consensus necessary for leader election, the minimal number of nodes being 3.

      To support a zone outage it requires 3 zones, having only 2 zones means that there are more nodes on one side, when this side goes down the other size can not be elected as the leader because there is no majority, so the service is not available.

      Deployment Architecture

      The goal is to have an Active / Active / Active deployment.

      Multi-Regions deployment

      Principles

      When deploying Nuxeo across multiple regions the goal should be disaster recovery: be sure that the company / application can continue working even if a regions goes down completely.

      When talking about DRP, there are several metrics that impact the target architecture:

      • RTO: Recovery Time Objective
        • maximum time before the system can be up again
        • this is basically the maximum down time
      • RPO: Recovery Point Objective
        • maximum amount of data that can be lost

      Obviously is RTO/RPO are large a simple externalized backup can be a solution.

      However, when RTO is below 1h and RPO is a few minutes, we need a dedicated architecture.

      Limitations: not HA

      DRP is about replicating the data between data centers on different regions, because this replication can not be synchronous (network latency) one site is always "behind" the other meaning that it can not be used for serving user requests.

      Another way to say that is: because this replication is asynchronous and there is network latency there is a window of data loss in case of failover.

      Also this latency requires to duplicate services because cluster can not be stretched between regions
      One other goal for using multi-regions may be to optimize geographical delivery, but for that Nuxeo approach will be more CDN and upload accelerators and this is not the architecture discussed below.

      Constraints

      Since HA is usually also needed, we need to have a mixed architecture with:

      • 3 AZs deployment
      • 1 DRP deployment
      Deployment Architecture

      Focus for this Epic

      We will evaluate both the Multi-AZ (HA) failover scenario as well as the Multi-region (DR, with high latency) failover scenario during our testing.

        Attachments

          Activity

            People

            • Assignee:
              tdelprat Thierry Delprat
              Reporter:
              tdelprat Thierry Delprat
              Participants:
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: