About five hundred million years ago, creatures of the ocean started venturing onto land. Evolution, through many iterations, enabled some select creatures with the unique ability to live both on land and in the water. This ability to feel native in both places was a radical breakthrough and was the key to the success of these hybrid animals. Most fish, however, still remain confined to the oceans today. No one would take a fish in an aquarium and place the aquarium in the middle of the marshes and call it adaptation or migration. It’s an uncomfortable lift-and-shift! Life (and more importantly death) has taught nature to shed fragile contraptions and favor designs that have native resiliency and efficiency built into them. It was through a series of explorations and mutations that only about 200 million years ago, the true crocodiles arrived. They have since ruled the planet and proved to be the hardiest of the large animals.
In the tech industry, evolution happens, but at a much more rapid clip. About a decade ago, enterprise IT infrastructure was presented a challenge to move to the cloud, driven by the desire to reduce CapEx, increase agility and expand businesses rapidly at a global scale. A serendipitous mutation called Hyper Converged Infrastructure (HCI) happened around the same time, which brought in cloud-like features to enterprise infrastructure. The three-tier architecture, just like the fish, has not been able to survive in the cloud. Early attempts by search companies to use three-tier architectures with SAN and NAS appliances did not succeed. HCI ushered in a new cloud-like architecture in the enterprise and now, the on-premises data centers are rapidly deploying HCI. While this first transition is maturing, some of the HCI architectures are “coming out of the ocean” and forming Hybrid Clouds — some more boldly than others.
Just like in nature, for any engineering architecture to succeed, it needs to be simple and efficient.
AWS is a leader in cloud computing and makes it simple to bring Nutanix HCI software stack to the cloud by providing API-driven access to its bare metal servers. While there are many different ways of bringing an HCI stack on top of AWS bare metal, at Nutanix, we have chosen the most native approach to the problem with the conviction that such an architecture is in the best interest of our joint customers.
Nutanix brings a first-of-a-kind hybrid cloud offering that delivers true hybridity and true elasticity. Let us take a deeper look:
- Many customers have existing AWS accounts. True hybridity calls for using the existing AWS accounts, VPCs, VPNs, Direct Connects while bringing the private and public clouds together. With Xi Clusters, current customers of AWS can leverage their existing environments and launch Nutanix Enterprise Cloud OS within their current environments, without the need to create a new AWS account, VPCs or WAN networking.
- True hybridity also allows for bringing together the cloud native services to the classic apps and containers running on Nutanix Enterprise Cloud OS without the need to go via inefficient network gateways or VPC peering. With Xi Clusters, not only can the classic apps be on the same subnets as the cloud-native services and apps, but they can also get native network performance with minimal overheads. This also simplifies migrating apps from the Xi Clusters to AWS EC2 native and vice versa without the need of IP address changes or any network reconfiguration.
- One of the key aspects of hybrid is to be able to manage both the private and public sides of infrastructure through the same console, without adding management overheads on the public cloud side. Xi Clusters simply brings up AOS nodes in AWS bare metal while managing them from existing Prism Central and imposing no networking management VMs or networking gateway VMs.
- Cloud infrastructure should allow quick burstability. AWS provides an elastic bare metal service in EC2. Xi Clusters allows customers to spin up clusters on demand and in minutes. The cloud infrastructure is available from AWS at an hourly granularity. As the capacity requirement of a cluster increases or decreases nodes can be added or removed on demand.
- Cloud infrastructure should allow for the sporadic nature of business without the need to recreate or migrate the assets each time. Xi Clusters allow for hibernating a running cluster along with its VMs for any period of time into AWS S3 – another first in the industry feature. During hibernation, no compute costs are incurred. Whenever the workloads are required to run again, the Xi Cluster can be resumedand all the workloads are brought to life. This allows for an elastic infrastructure for seasonal but stateful workloads.
Xi Clusters Design Choices
We had a choice between creating a new AWS account for the customer or using an existing one to manage Xi Clusters. Using a new AWS account would have given a clean working space and made it somewhat easier to build the product. However, from a customer’s perspective that proved to be less optimal because the customers cannot use their existing accounts and credits with AWS. Hence, we decided to not create the burden of new AWS accounts for the customer. Their current accounts can be used. The customer will be directly billed by AWS for the infrastructure spend and only pay Nutanix for the software cost of using Nutanix for the duration the Xi Clusters are used.
We had a choice between deploying the Nutanix VMs on an overlay network (using VXLAN) on top of the AWS subnets, or deploying the Nutanix VMs directly on the AWS subnets. Deploying an overlay network provides for easier integration with the underlying cloud networking because nothing needs to change in the way the hypervisor does IP address management.
However, choosing an overlay presents many challenges:
- Running an overlay requires management VMs (at least a controller and couple of Network Edge gateways). That overhead presents a challenge to our simple and efficient mantra.
- Encapsulating traffic does present CPU overhead that is non-trivial and achieving bandwidths higher than 10GBits/sec becomes hard.
- When IP addresses on the overlay talk to IP addresses in native AWS EC2, they go through the Network Edge gateways. That creates a performance bottleneck and if not scaled out (causing additional overhead), may lead to a downtime during the upgrades.
Hence, we decided to explore a more native integration with AWS EC2 networking. This new native networking model has the following features:
- There is no overlay needed, hence no VMs that act as network controller or network edge gateways. There are 0 management VMs needed saving expensive resources in the cloud and also reducing complexity of management.
- The VMs running on Nutanix AHV are assigned IP addresses that are provided by native AWS networking and recognized by AWS switching fabric.
- When VMs talk to each other within the Nutanix Xi Clusters or to native EC2 VMs, they do not have to go through any gateways but rather are directly switched by AWS. This allows user VMs to talk natively to cloud services without going through any translation of packets from overlay to underlay. This results in high performance and low latency networking.
- To achieve the above, AHV has been modified to add deep integration for AWS networking.
Xi Clusters Architecture
Xi Clusters are designed to look virtually the same as on-premises Nutanix clusters. These clusters run the complete Nutanix AOS and AHV stack with no change in CLI, UI or APIs. This allows existing IT processes or 3rd party integrations that work on-premises to continue to work with Xi Clusters in AWS.
With Xi Clusters, the complete Nutanix HCI stack runs directly on the AWS EC2 bare metal instances. The bare metal runs the AHV hypervisor and just like any on-premises deployment, runs a Controller Virtual Machine (CVM) with direct access to NVMe instance storage hardware. The Nutanix AOS software provides high-performance, low latency and highly available storage using these local NVMe disks. Xi Clusters running in AWS can be managed by an existing on-premises Prism Central or from a Prism Central deployed on Xi Clusters in AWS.
Customers can deploy Xi Clusters in AWS from their my.nutanix.com account. They can perform day-to-day cluster management via Prism Central and use my.nutanix.com account for creation, hibernation, deletion, and billing of their clusters in AWS.
Xi Clusters in AWS look similar to Nutanix clusters on-premises. A cluster can have 3 or more EC2 i3.metal bare metal instances. AHV runs directly on the bare metal and exposes the local NVMe storage to CVMs. The CVMs on each instance cluster together and provide single storage fabric across all nodes with all the enterprise storage capabilities that enterprise apps need. The storage fabric in Xi Clusters can be connected to the on-premises using Nutanix AOS DR, backup, replication capabilities, allowing seamless mobility of stateful applications from on-premises to AWS and back.
Not only can Xi Clusters natively interact with the on-premises clusters at the storage layer, they can also extend Nutanix hybrid cloud storage capabilities like Volumes, Files, and Buckets to native workloads running on AWS EC2. This allows for workloads that require DR or backup to on-premises but need to leverage compute from AWS EC2 for bursting.
AHV runs an efficient embedded distributed network controller that integrates user VM networking with AWS networking. The network controller does not create an overlay network. The way it works is that all user VM IPs are assigned to the bare metal host where the VMs happen to run. The AHV embedded network controller simply forwards the packets from the host to the right VM on the host or wherever it might have migrated to. IP address management is integrated with AWS VPC, hence all user VM IPs are allocated by AWS from the AWS subnets in the existing VPCs.
There is no additional configuration required for AHV user VMs to access AWS services and other EC2 instances; and also for EC2 instances to connect directly to services on AHV user VMs, using their assigned IP addresses. The network supports near native networking performance between AHV user VMs and EC2 instances.
As a result of the above architecture, there is no need for network controller VMs, network edge gateways or any other management VMs to be run on the Nutanix Xi Clusters. Microsegmentation is implemented via the embedded networking controller in AHV and the policies are managed from Prism Central which could be running on-premises or in the cloud on the Xi Clusters.
What remains the same compared to on-premises?
Almost all features, APIs and user experience remains the same. The clusters in AWS act just like a true extension of the on-prem DC. There is seamless mobility between on-premises and cloud and vice-versa without any changes to the applications.
What is different between Nutanix On-premises and Xi Clusters?
Six Use Cases of Xi Clusters
Successful enterprises routinely are looking to quickly expand the presence of their business apps into regions where they do not yet have a physical datacenter footprint. This needs to be done without any change to the apps. With Xi Clusters, enterprises can choose from the numerous AWS Global regions to rapidly expand their presence world-wide.
Many enterprises have to deal with fragmentation of data centers assets and would like to consolidate or shut down some underperforming datacenter assets. Having the option to consolidate into a nearby AWS availability zone with an elastic capacity while not having to worry about the compatibility of apps with the new cloud environment is ideal for such transitions. Xi Clusters offers the ability to run enterprise apps as-is on the AWS infrastructure while managing the applications from a single Prism console which can be either on-prem or in the cloud.
Retail companies see steep seasonal peaks in infrastructure demand. Since adding new nodes on-premises is a long process, many enterprises are forced to have an infrastructure that is mostly underutilized during most of the year. Using Xi Clusters, now enterprises can keep the utilization high during non-peak seasons by reducing the overall infrastructure footprint and bursting into the cloud in the peak seasons. Since the Xi clusters in AWS and on premises clusters are on the same network and storage fabric migration of applications back and forth is convenient.
Some classic apps require access to shared disks with SCSI-PR support. Xi Clusters enables these apps to now run inside AWS. This flexibility allows for new class of applications to migrate to the public cloud.
High end databases, big data apps or other I/O hungry applications may not be able to get enough storage IOPs in the cloud native environment without changing their architecture or spending a lot of provisioned IOPs. Xi Clusters brings the high storage IOPs capability of AOS on top of AWS bare metal servers, providing for a new class of I/O hungry applications to move to the cloud.
When migrating applications to the cloud, the common challenge faced by customers is that not all components of the app can be made cloud-ready at the same time. So, the common challenge is the long distance splitting of the on-prem app components and in-cloud app components leading to latency issues. This frustrates many cloud migration efforts. With Xi Clusters now, app components can be moved as-is and without any changes to the cloud to be in close proximity to their cloud-native components. The unique integration Xi Clusters provides with native AWS networking enables high bandwidth, low latency streaming between the app components on both the Nutanix Infrastructure on AWS and on native AWS VMs.
Xi Clusters on AWS is expected to be available in Technical Preview this summer. If you are interested in participating in the Technical Preview of Xi Clusters, please send an email to firstname.lastname@example.org.