Rethinking Compute Power in the Data Center

 
 
By eweek  |  Posted 2005-09-29
 
 
 

Rethinking Compute Power in the Data Center


Azul Systems Inc., in Mountain View, Calif., wants to do with processing power what others have done in storage and networking—create an environment where servers can access a ready pool of compute power when needed. The company in April unveiled the first generation of its Compute Appliance products, massive machines with up to 384 processors that speed Java performance and can help enterprises looking to consolidate their data centers and reduce IT costs by doing the work of numerous low-end servers. President and CEO Stephen DeWitt recently spoke with eWEEK Senior Editor Jeffrey Burt about Azuls data center philosophy, its products and its future.

What is network-attached processing?

One of the reasons we call it "network-attached processing" is very much a play off of "network-attached storage," because I think that is something that people can look back to and gauge its impact inside their environment over the last decade.

Network-attached processing, in its most simple terms, is the ability for existing server infrastructure, whether it be Intel [Corp.]-based servers or Unix-based servers, unmodified, to mount an external pool of computing power. That external pool of computing power was built from the ground up for the way that people build applications today, using virtual-machine environments like Java, J2EE [Java 2 Enterprise Edition], .Net, etc. So as existing infrastructure mounts this pool, it takes advantage of a class of infrastructure thats highly optimized for those workloads. Probably the biggest end-user benefit in mounting external processing power is the ability to eliminate the need to capacity-plan at the individual application level.

Just as your notebook is able to mount terabytes of external storage, a two-way Xeon box can mount a compute pool and literally have the processing power of the largest systems in the market transparently without the customer having to do anything.

Click here to read more about Azuls appliances.

Youve said that you want to do with compute power what other companies have done with storage and networking.

In the world of compute, things have to change. Weve had pretty much the same computing model since the mid-60s. What this means for the end user is as they size their infrastructure for the applications that theyre deploying, they size that infrastructure on one of two axes—they either horizontally scale server infrastructure by clustering a whole bunch of small-denomination compute bricks, or they buy big iron. I think the industry as a whole knows the good, the bad and the ugly associated with both horizontal and vertical scale.

We have an opportunity right now to eliminate a lot of that architectural inefficiency, and if you accept as a given that the world is moving to virtual-machine environments—and thats a pretty safe assumption, given that thats the strategy of just about everybody, Microsoft [Corp.], IBM, Oracle [Corp.], BEA [Systems Inc.], SAP [AG], etc.—then the concept weve pioneered is very viable.

Just as NFS [network file systems] open standards-based protocols allowed us to mount transparently external storage, the world of virtual machines allows us the opportunity to separate the function of compute from the computer, and by doing so allows existing infrastructure to mount this big shared pool. We talk so much about utility computing—I dont think theres been anything in the last "n" number of years in the computing world thats been hyped as much as utility computing. But if you really do believe that the function of compute can be a utility—and we very much subscribe to that—then theres some very fundamental things that have to change. It starts at the underlying architecture, it starts in the way that processing power is delivered, [and] it also involves the economics of processing. Its not just how, its also how much, and we think thats real key.

Next Page: Azul gets customers onboard with a disruptive technology.

Azul gets customers onboard


with a disruptive technology">

Analysts have called Azuls technology fairly disruptive. What is Azul doing to convince users to try it out?

This is probably the most disruptive technology thats come across the compute landscape certainly in my professional career, which dates back to the mid-80s. The most important thing right now is going out there and testing. It was just about a year ago … that we broke silence on all of this, and over the course of the last year, we have worked very closely with key partners like IBM and BEA, JBoss [Inc.], Oracle, the key J2EE-class vendors. All of them have seen our gear, all of them have tested our gear. We just announced certification with BEA, so weve had … the J2EE community—which stands to benefit enormously from this—beating on our gear for a long, long time.

Weve also had a number of key integrator partners, like an EDS [Electronic Data Systems Corp.], companies that know a heck of a lot about data centers, know a heck of a lot about provisioning, the state of the utility offerings from the traditional systems vendors, beating on our gear. Then, as we got closer to our first customer ship and general availability, we have been bringing our gear into some of the most complex data centers around the globe, not just here in the U.S. These are customers in the financial services, the Wall Street types, the big global logistics companies, the big telecommunications companies both here and abroad, Internet properties, [and] high transactional businesses. … These kind of customers who are conservative by nature but feel the pain of trying to scale applications in a highly unpredictable world and dont want to continue down the road of an inefficient model of horizontal or vertical scaling are early adopters.

The only way you gain confidence in any new technology is prove it, prove it, prove it, prove it. Youre going to see a number of benchmark results coming out from Azul over the next few weeks and months. … Were very pleased in the performance weve been able to deliver.

Click here to read an editorial suggesting that grid computing might not work in the enterprise.

You released the Compute Appliance in April. How many of these are in actual production, and how many are still in test mode?

A little bit of both. I would say the majority are in pre-production and there are probably a handful of customers that are in production.

One of the elegant elements of this technology is the way customers can deploy this. You cant bring any new technology into a big data center like the enterprise Fortune 500-class data centers. That requires tremendous change. You have to bring something that can be transparently implemented. Network-attached processing, much like network-attached storage, can be mounted into your existing environment. What were seeing in the early production uses of it is theyre mounting the compute pools behind existing clusters as buffer, and as extra capacity.

Think about it. If you have an application that has four instances across a four-node cluster and youre seeing wild peaks and valleys to compute usages, as opposed to adding more, more and more blades or two-ways, more four-ways, that continue to show the inefficiencies of that model, people will just take that cluster and have it back-ended to the compute pool.

That gives them the opportunity to see the host reduction factor that we enable. It also lets them gauge performance. It also allows them to understand any sort of networking traffic, latency or I/O issues that exist. It gives people a real-world opportunity to see this. As they gain confidence, then they can start consolidating their data centers.

Customers are very anxious about the profitability of their data centers, and if you look at the period between 1995 and 2005, there has been such enormous build-out. In these last 10 years, data centers have absolutely exploded, to the point where youre across from [executives with] the flagship-branded telecommunications companies and banks, and they tell you, were out of space and, more importantly, for the amount of money that weve invested in these data centers, relative to the applications that we spin out of these data centers—whether theyre applications that run the business or generate revenue streams or run trading floors—that the profitability—what theyve invested vs. the return that theyre getting—has hit an all-time low. What theyre looking for are technologies that they can get to know, that they can embrace, that they can [deploy] inside their environment that dramatically reduce the cost structure of their data centers. Taking 10 percent out means absolutely nothing to them. Theyre looking for things to change. So as we come in with this vision and this story of network-attached processing and we say, "If you look over the 10 years that are to come, 2005 to 2015, our vision of computing is that small denominations—the two-ways, the four-ways, the eight-ways, etc.—becomes an irrelevant metric."

By 2015, our vision is that applications, whether they be small, big, mission-critical, back-office, are able to tap into an enormous bucket of processing power thats built for that workload. Instead of buying servers, youre buying the service of processing. Same with storage and same with networking now, people dont buy in small denominations. They buy into a fabric that they can tap into and share.

[With the rise of service-oriented architectures], that makes the need for this model even more critical because now youre not trying to capacity plan around applications, youre trying to capacity plan around services. Thats really hard to do, because its really difficult for an IT manager with 50 different applications from all these different business units to be able to capacity plan at the individual application level, much less trying to capacity plan around some service or identity service. So this sort of shared fabric—eliminating the pain of peaks and valleys—has huge ramifications in terms of data center profitability, and thats what customers are resonating with.

Since our technology doesnt require any religion—theres no [operating system] religion, we dont care if youre a Linux shop or a Solaris shop or Windows shop, we have no binary overhead, we dont carry instruction set baggage like you do in x86 or any of the other microprocessor architectures—were big mountable power that can be injected or ejected within seconds and by doing so you eliminate capacity planning around compute.

Next Page: Reduced power consumption is critical for ROI.

Reduced power consumption is


critical for ROI">

You mentioned the desire among businesses to get a return on their investment. However, your technology is pricey. How do you address the cost issue with customers?

Our 96-way has a list price of $89,000, so if you think about that on a per-processor basis, youre talking about $1,200 per processor. Thats pretty cheap. Go look at your favorite Dell [Inc.] box and look at what that costs on a per-processor basis. Part of our whole economic value proposition was to reset the commodity line in the industry. As an industry, we all admire Dell and we admire the sophistication that Dell has in their business model. They made 18.6 points in gross margin last quarter, and theyre the kingpin in terms of operation efficiency.

But if you think about that as the commodity line in the industry, part of the hypothesis around Azul was that we would be able to fundamentally transform the economics associated with processing power. That involves a lot of things. First of all, it does involve your traditional price/performance metrics. We have to be very [strong on price/performance] relative to existing technology, but the big home run for Azul is the total cost of ownership.

Any infrastructure play, whether its storage, networking, database, etc., all eventually boils down to a TCO play. While our capital costs are extremely competitive, ranging from an $89,000 96-way up to a half-a-million 384-way—which is pretty amazing relative to traditional big iron—the big win for us is the fact that, first off, customers are seeing significant host-reduction factors. We are seeing at a minimum 3X in host reduction factors in virtually every eval or existing customer that weve gone to. This means theyre able to reduce their host front end by a factor of three. In many cases, weve seen host reduction factors well up in the double-digits.

Plus, how many people are required to manage 50 servers? If youve got 50 1U (1.75-inch) boxes—just boxes that you could buy running Linux and BEA and an instance of an application youve authored—there are bodies associated with that. Theres a huge human factor associated with that.

Power and space. Our 96-way only consumes 700 watts of power. Our 384-way only consumer 2½ kilowatts of power. So in a standard 42U (73.5-inch) rack, we can put enormous power in a very small footprint. In New York, where youre charged 18 cents per kilowatt hour, or London, where youre charged 25 cents per kilowatt hour, this gear pays for itself in what youre saving in power relative to comparable capacity in the old building-block approach.

I find it amazing to watch the traditional server vendors talk about power savings when you look at what network-attached processing relative to existing servers. Were talking multiple orders of magnitude here. Not only pure power savings, but also in real estate. Our 384-way is in an 11U [19.25-inch] form factor, a little bit bigger than a bread box. And our 96-way is in a 5U [8.75-inch form factor]. So when you start looking at the ecosystem costs, the human costs, the power and space, you start to get the sense of how overwhelming this value proposition is.

Click here to read about Intels vision for servers.

But what really takes the argument off the table concerning the old way of doing things is that fact that you eliminate the need to capacity plan at the individual application level. If youre a bank, and you have 1,200 applications in your bank, every one of those applications requires capacity planning. How much power are you going to need next Thursday at 4 oclock? And everybody over-provisions, because the one thing IT is never going to do is under-provision. Youre always going to have overage there, so youre always going to have underutilization rates. Youre going to have the 8, 9, 10, 11 percent utilization rates. Its just the way things are right now, but things need to change. People need data centers to be more profitable. They need to start seeing 50, 60, 70 percent utilization of their server infrastructure, and yet have capacity available at a micro-second type granularity to address the unpredictable nature of compute. Thats what network-attached processing does. Just as network-attached storage solved that in the storage world, and networking in general solved that in the networking world, we solve it in the world of compute.

Next Page: Technology advances will address the latency issues.

Technology advances will address


the latency issues">

What about the issue of latency? If youre taking the workload off the server, sending it the Compute Appliance, crunching the numbers and sending it back, arent you adding latency into the equation?

Were another hop in the wire, so obviously we introduce wire-level latency between the host and us. In a zero-loaded world, we add a couple of microseconds to the process, but nobody cares about a zero-loaded environment. What people care about is a loaded environment, and in a loaded environment, we effectively eliminate latency.

This is the first 21st century computing platform. This is a computing platform birthed in the 21st century, created in the 21st century, built in the 21st century, so were not going back to the challenges of the past. Our underlying processor architecture is perfectly suited for virtual machine workloads, so we dont end up in the sort of queuing penalty box that exists in virtually every other server thats ever been built. Not only do we have enormous throughput, which solves a lot of the latency issues in general, but we also bring so much power to bear that, in a loaded environment, latency doesnt become an issue.

Were actually starting to see—and I think this is going to become pervasive over the next decade—in certain metro areas like London and most certainly will be in New York the service provider, whether its the big telco or whatever the situation is in a geography, has so much fiber control over the infrastructure there, that they are only a handful of microseconds away from the data centers of their customers, they are actually truly able to vend processing power to host systems that are located on a customers premise with virtually no latency overhead. … Its not all there today. It wont be in the next three or four years in mass markets, but within the next 10 years, with the work that were doing and the evolution of network-attached processing, what theyre doing in terms of their fiber infrastructure, what the networking vendors are doing, I think what youre going to see is completely different compute model. The legacy model is in the twilight of its historical relevance.

Click here to read how an industry group is approaching security issues in grid computing.

Right now this technology is targeted at J2EE workloads. Will we be seeing support for .Net?

Absolutely. And the amazing thing is its the same infrastructure. No changes to the hardware, no changes to the microprocessor, which nobodys ever done before, and that goes to the agnostic nature of network-attached processing. Were not in the OS game. We dont do what IBM and [Hewlett-Packard Co.] and Sun [Microsystems Inc.] and Dell do, as far as thats concerned. Were focused on delivering processing cycles.

The engineering challenge thats in front of us for .Net support is doing the same sort of segmented virtual machine work that we pioneered in the world of Java to the world of CLR [Common Language Runtime]. Were in discussions with Microsoft on that and we hope to announce a formal plan of record in the weeks ahead.

Looking forward, in what other directions are you hoping to take Azul?

Theres a couple of things. If you go back to the foundation of the company, thats to map the architecture to the way that people build applications today. The shared compute pool model is applicable in other areas of processing as well. Take SSL [Secure Socket Layer] for example. I think enterprises would SSL everything if they could capacity plan it, effectively delivering SSL. But they dont because of the challenge thats associated with it. Those are big pools of compute that can be delivered. XML, etc. So this whole concept of delivering big pools of processing power has extensibility into a number of other processing areas, and were looking at that.

Another area of intense interest to us is the role of the network between the tiers. If you subscribe to the vision that networking and storage and the various forms of processing power are ultimately services, not server-bound, then that requires a level of homogenization between network infrastructure and the infrastructure that delivers these services. So that is an area that we are investing heavily in.

Check out eWEEK.coms for the latest utility computing news, reviews and analysis.

Rocket Fuel