The Anti-converged System: 10 Steps to a Disaggregated Data Center

 
 
By Chris Preimesberger  |  Posted 2015-11-03
 
 
 
 
 
 
 
 
 
  • Previous
    1 - The Anti-converged System: 10 Steps to a Disaggregated Data Center
    Next

    The Anti-converged System: 10 Steps to a Disaggregated Data Center

    A disaggregated data center—a well-connected but intentionally decoupled IT system—is often more flexible and has fewer underutilized resources.
  • Previous
    2 - Transportable Environments
    Next

    Transportable Environments

    When the data center is viewed as a collection of resources, the workload that runs on those resources should be mobile and easy-to-move. Whether a workload runs in a container (e.g., Docker and Kubernetes), in a virtual machine or via a batch-processing framework, it should be responsive and largely independent of the hardware on which it runs. This allows the data center to migrate workloads and optimize resource utilization.
  • Previous
    3 - OSI Optimization
    Next

    OSI Optimization

    The network should be flexible enough to support workloads and hardware that are moving and changing. Typically, SDN and NFV are the go-to concepts in this area; as the environment shifts, the network should shift along with it, without the need for manual reconfiguration or human intervention.
  • Previous
    4 - Embarrassing Parallelism
    Next

    Embarrassing Parallelism

    The concept of "embarrassing parallelism" has flourished in regard to the disaggregated data center. In parallel computing, an embarrassingly parallel workload—or embarrassingly parallel problem—is one for which little or no effort is required to separate the problem into a number of parallel tasks. This often is the case where there exists no dependency (or communication) between those parallel tasks. In an environment where workloads are transportable and the network is flexible, it is imperative for service designers to build around the embarrassingly parallel aspects of their applications. How can a workload be divided up and distributed to a pool of data center resources that can be scaled up and down as load increases and decreases?
  • Previous
    5 - Fault Expectancy
    Next

    Fault Expectancy

    In any data center, failures happen. In a modern disaggregated data center, failures are to be expected. Just as a robust stand-alone application handles read/write failures, transient resource unavailability and unexpected shutdowns, services for the disaggregated data center must expect any and all resources to become temporarily unavailable, and be able to recover from and adapt to these changes in discrete resources.
  • Previous
    6 - The CAP Theorem and Eventual Consistency
    Next

    The CAP Theorem and Eventual Consistency

    Eric Brewer's CAP (consistency, availability and partition-tolerance) Theorem is taught in almost any good college-level distributed systems course, and should be battle-tested knowledge for any service architect. The theorem states that it is impossible for a distributed system to maintain all three CAP properties. While some debate exists as to how hard and fast these rules are, they are parameters by which service architects live and die. Eventual consistency is one concept that aids in both the consistency and availability space, and it tends to be more broadly applicable than many would argue. Partition-tolerance can be tricky, and resource and network partitions can wreak havoc on many eventual consistency implementations. However, robust service design with CAP in mind is paramount to the existence of the disaggregated data center.
  • Previous
    7 - Look South Into All Subparts of the Rack
    Next

    Look South Into All Subparts of the Rack

    The modern disaggregated data center is comprised of racks full of resources. It is imperative for service managers and designers to have the ability to programmatically look southbound into the rack–specifically, being able to enumerate, monitor and control all subparts of all components in that rack. This requires granular application programming interfaces (APIs) to allow access to this information. Ideally, a single, powerful southbound API should be made available, though in some cases, a variety of APIs are cobbled together (e.g. IPMI, SNMP, etc.). Beyond resources, placing sensors at rack-level and component-level also provides a valuable glimpse into what is going on in a rack in a disaggregated data center.
  • Previous
    8 - Look North of the Rack, Too
    Next

    Look North of the Rack, Too

    Granular southbound resource information is one thing, but it can turn into a firehose of information out of context without a northbound component to help put the pieces together. Resources in a disaggregated data center do not exist in a vacuum; workloads, environmental characteristics, and cross-rack and cross-data center factors all come in to play. A good way of looking north of the rack is to consider how to package and aggregate southbound information into a shape that is more easily consumable and actionable up the chain. However, let's not confuse this with monitoring and automation. Looking northbound really means determining what is needed to manage the resources and how to get that information there.
  • Previous
    9 - Monitor Everything
    Next

    Monitor Everything

    Most data centers involve varying degrees of automated monitoring; having a technician walking through the aisles with a clipboard just doesn't scale. In a disaggregated data center, it is critical to monitor every aspect of every resource: sensorification (which is surprisingly inexpensive), device-specific data points (which we get for free via device and OS APIs) and broader environmental characteristics (e.g., building management system and sensor data). The more that is monitored, the better a picture that may be drawn about the overall state of the data center: from heat maps, to resource utilization mapping, customer billing and failure postmortems. As more is monitored in a disaggregated data center, better, cheaper and more resilient services may be built.
  • Previous
    10 - Automate Everything
    Next

    Automate Everything

    Partial data center automation is not uncommon. Tools like Puppet, Chef and Ansible have removed the pain and manual labor aspect from part of the equation, but there is always room to further take costly and error-prone human decision-making out of the decision-making process. In the disaggregated data center, with the capabilities and principles defined in previous sections, it should be possible to automate everything from workload migration to environmental and building controls based on operational insight and measurements superior to the traditional, but useful, PUE (power usage effectiveness) metric, which equals Total IT Power divided by IT Equipment Power.
  • Previous
    11 - Intelligent Metric-Driven Decision Making
    Next

    Intelligent Metric-Driven Decision Making

    In a modern disaggregated data center, it becomes possible to focus on metric-driven decisions. When you have well-built services that expect failure and can be easily migrated, running on hardware and in an environment that is heavily instrumented and easily orchestrated, it becomes possible to assert and drive decisions around metrics, such as performance/per watt/per dollar. In the disaggregated data center, data comes from every aspect of the data center, and based on that, intelligent decisions can be made to determine how resources are used and managed.
 

Converged IT systems—stuffing more and more functionality into smaller and smaller data center components—has been a trend for the past half-dozen years. As the world moves toward the Internet of things, data centers must be viewed as a collection of resources that are able to evolve to address the changing requirements of enterprise workloads. Thus, being able to go in and replace, repair or otherwise swap out software and components independently becomes more important. This has led to the concept of a disaggregated data center, a well-connected but intentionally decoupled IT system. Networking and storage are often purchased and configured separately from servers; disaggregating systems goes deeper to also target the processing, random-access memory and I/O subsystems. Hyperscale cloud service providers, for example, are interested in disaggregation because they see it as more flexible with fewer underutilized resources. This eWEEK slide show, incorporating input from Cole Crawford and Andrew Cencini, co-founders of modular data center specialist Vapor IO, offers 10 steps IT managers need to follow to move toward this new model.

 
 
 
 
 
 
 
 
 
 
 

Submit a Comment

Loading Comments...
 
Manage your Newsletters: Login   Register My Newsletters























 
 
 
 
 
 
 
 
 
Rocket Fuel