Mike Vizard: Today were are going to talk about storage and application performance. Gear 6 has a special device that focuses on caching and storage for certain types of applications. And Im going to let Gary describe how it works.
Gary Orenstein: Basically, Gear6 is focused on what we call storage acceleration, particularly application acceleration. The product that Gear6 makes is what we call a scalable caching appliance, which is essentially an appliance that clusters together high-speed, high-capacity memory to serve data 10 to 50 times faster than if that data was being transferred from traditional mechanical disk. The idea is to place this caching appliance in the network to complement all the existing storage that is there, thereby offloading some of the more data-intensive processes from the disk, thereby serving it from memory and ultimately speeding up applications.
Vizard: How do I figure out what parts of the application or the database to put in memory versus rely on disk, and where do I draw those lines?
Orenstein: One of the elegant approaches with caching is that it is, by its very nature, a somewhat management-less implementation. That said, what happens is when the appliance is placed in the network, the appliance will deliver data thats requested from the application over time. The caching appliance is simply keeping the most frequently accessed data in cache. And so, from an administrators perspective, what the administrator needs to do is simply make sure that they have the right-sized caching appliance in their environment, so that the bulk of the active data set remains in cache. But over time, if the workload changes, the cache will simply keep that most frequently requested data in cache. And if a piece of data is no longer being actively used, it will fall out of cache, all the while remaining in that persistent storage layer.
Vizard: So I dont have to do any fancy artwork around figuring out what part of the database is the most frequently called? The appliance will kind of naturally come to that?
Orenstein: Exactly, and that has not always been the case, because, historically, caching has been a very scarce resource. Its been very limited in terms of the amount of cache that could be kept on individual application or database servers and the amount of cache that could be kept on storage systems. But now, our caching appliance allows customers to build cache pools that can range into the terabyte size. Its possible for customers not to worry about how they have to fine tune everything, and simply have a cache size thats large enough to keep the majority of the active data set.
Vizard: Now, most people would have thought, at least historically, that cache is too expensive to use in this kind of context. So, they would sit there and say, you know, were just going to use storage devices and hope for the best and hope for some good IO. But now, Im starting to see cache show up on the drives themselves from certain vendors. So, what is happening with the cost structure around cache thats making it more affordable or reasonable to try this?
Orenstein: A couple good points there. One is that cache has been around in the data center for a long time and will remain at all levels, from the server, to the storage device to the disk drives themselves. In terms of measuring some of the cost/benefit of this type of solution, you really have to focus on the IO operations per second, as opposed to just some of the traditional metrics of capacity, so to speak. And when you look at the equation from an IO operations or IOPS per second perspective, caching is actually relatively cheap compared to the cost of having to deploy hundreds or even thousands of disks to get that same kind of performance level. I think what were seeing today is that, historically, disks have been cheap and customers have chosen to deploy disks because they were cheap, and often deploy them to a certain state just to get performance. But now, with the costs of power, and cooling and space rising, and people looking to consolidate some of their environments, I think folks are realizing that just deploying disks alone isnt necessarily the most cost-effective solution. So what were seeing is customers consolidate some of their storage into a more, higher-utilized pool of service, get the utilization rate up, have the storage capacity for what they need, and then complement that with a caching appliance. The other point is that now – since you can put the caching appliance in the network and have that caching appliance basically provide a boost to any application server or any storage data set – youre able to amortize or spread the cost of that over a much greater range in the data center than historically placing a very limited amount of cache in one location only.
Next Page: Storage and application performance.
Storage and application performance
Vizard: A lot of people dont talk about this, but, at least its my personal viewpoint that the performance issues around storage, particularly in a network environment, is what leads to the low utilization rates there. Weve seen storage today raise because people are dedicating a raise to specific applications because they dont want to be concerned about any performance degradation. So, does the caching of a network appliance kind of take that issue off the table so now we can start getting after these utilization rates and storage?
Orenstein: It does to a large extent. Historically, people have had to actively manage all kinds of storage devices and even do things such as tiering, where there might be a low-capacity, high-performance storage tier, and complement that with a higher-capacity, lower-cost tier. All of that, of course, requires work on behalf of the IT administrator, and I dont know any IT administrators who voluntarily choose to slice and dice their storage infrastructure into umpteen different tiers. What were seeing is the move toward a type of architecture that simplifies all that into what can be loosely termed an accelerated archive where, on the one hand, you might keep a large capacity pool of disk based storage, maybe something using, for example, SATA drives that provide a very low cost, but high capacity pool of storage, and then complement that with a caching appliance to provide the performance where its needed. Of course, since that whole environment is dynamic and since the caching appliance can react dynamically to data sets or pieces of data that become frequently accessed and change back and forth, that doesnt require active management on behalf of the IT administrator. Once the infrastructure is set up, they can sort of sit back and let the caching appliance do what it does best, which is make the most frequently accessed data highly available and served with very low latency from high-speed memory.
Vizard: How does the appliance, which sits on the network, discover the various servers its supposed to interact with out there? And is it heterogeneous by nature or do I have to dedicate it to specific operating systems? How does that all work?
Orenstein: It is heterogeneous by nature and doesnt need to be specifically dedicated, but lets walk through the basic deployment model. Right now, Gear6 is focusing on the network attached storage or NAS market, and specifically the NFS protocol. Those environments typically include multiple clients or application servers and multiple NAS devices, usually all networked together with gigabit Ethernet. The deployment model for the Gear6 caching appliance, which we call Cache FX, is to plug that appliance into the network via gigabit Ethernet, and then simply to identify the NAS storage devices that youd like to accelerate. So, those appear in the management. And then from that point, customers can dedicate the application servers that are IO-constrained to view the data through the caching appliance. So its a relatively simple deployment model. One of the benefits of that model is that theres absolutely no change whatsoever to the existing storage devices and theres absolutely no change to the applications as well. Its a very low-risk, high-reward opportunity for anybody whos facing an IO constraint, or sometimes people refer to it as application brownout scenario, where they simply cant get enough performance from their traditional storage devices.
Vizard: Theres a lot of brownout effect out there. What usually happens is a lot of finger pointing because the infrastructure people point their finger at the developers for writing, you know, heavy, bloated applications. And the developers point their fingers back to the network infrastructure guys for having, you know, shoddy networks that are full of traffic that is being abused and not germane to the business at hand. How does this give everybody some kind of middle ground that they can come to without going to war every day over whos wrong?
Orenstein: We like to think of Gear6 as helping customers end the blame game, because that can be so troubling, especially when dealing in multi-vendor environments. One of the things that Gear6 provides to all of our customers and prospective customers, thats freely available on the Gear6 Web site, is a tool that we call NEMo, which is an IO-analysis tool. This is a simple Perl script that can be run on an application server to essentially take a snapshot of the IO traffic between the application and the storage device. We can take that trace and analyze that against a real live product in our lab to determine what wed call an acceleration factor, and present that to customers to say, “This is the range of performance improvement that we see, based on your real, live data.” So this is not some guesstimate thats just coming out of a discussion, but actually looking into the IO stream to specifically see what behavior is taking place, and to say how we can help in that particular case. And, frequently, that does sort of remove some of the finger pointing and give customers a really clear picture of isolating the specific IO bottleneck, and then it allows Gear6 to solve that problem.
Vizard: Do you perceive over time that, as the storage arrays get more powerful and a lot of them are carrying their own risk processors now, that more of the application load may move off the server for certain types of applications, and move over to the storage array thats also a co-processor with some basic compute engine capability in it?
Next Page: Storage and application performance.
Storage and application performance
Orenstein: I completely agree with your statement. The way I like to phrase it is, if you were to look at the typical data center and divide it into the server layer, the network layer and the storage layer – and maybe take an X-ray of that to identify wheres the processing power and wheres the memory – what you see is that its awfully top heavy right now, that theres an incredible amount of processing power with the multi-CPU motherboards, and the multi-core CPUs and the virtual machines that its taken the place as well. The networking layer is very robust in terms of the silicon and the processing that goes in there. The storage layer is still a little bit light, in general, on processing and memory. Of course, its heavily weighted to traditional disk media, rotating magnetic media. And so I think whats going to happen over time with the likes of Gear6 caching appliances, and other appliances and new storage systems that are coming out on the market is that youll see a balancing of the processing needs and the memory needs across all of those server network and storage layers. And, absolutely, over time, I think the storage devices are going to get more intelligent. I think the networks are going to get more intelligent and I think a number of appliances that complement that infrastructure will become increasingly more intelligent to rebalance the data center as a whole.
Vizard: In that model, when I think about the world, its a top-down, servers/ network/storage kind of play. Are you suggesting that that may flip where it could become a storage/network/server kind of architecture, because the servers are going to become (you could argue theyre becoming $2,000 peripherals right now) their compute engines?
Orenstein: You know, I always like to remind people that its called the data center, not the server center. And so, data is really the asset that the companies want to protect, and manage and extract the most information and value from in one way or another. And so, if that core information is the primary asset there, then the question becomes, how do we get the most amount of access to it, the most rapid amount of access to it, the ability to crunch that data and turn it into useful information? And your hypothesis, I think, is spot on in that we may see a reversal of the architecture from the data out as opposed to from the server or the application in.
Vizard: Given your device, it has cache, its essentially a box, do you worry at all that the IBMs of the world and the EMCs or NetApps are just going to load up their own cache appliance and set that up in front of their own set of devices? How will you stand up as a competitor?
Orenstein: Thats a good question. Those companies that you mention generally focus on making storage devices as opposed to devices that reside in the network as caching devices. So, none of them has caching devices right now. We tend to be a very strong partner with any company thats making a storage system, because we go into that environment. And whats unique about Gear6 compared to some other emerging companies in the market is that we dont have a new file system and we dont provide persistent storage. So, the very architectural model of Gear6 and our Cache FX appliance is to enhance an existing storage footprint at the persistent storage layer, and that could be from IBM, or EMC or Network Appliance, or any other major vendor out there. I think, again, those companies are specialists, typically, at what I would call edge devices that are very effective at providing persistent storage, and the services that go around persistent storage such as backup, and recovery, and snapshoting, and replication, and provisioning and a host of other tools. They havent been as strong in making devices that are a little bit more network resident, and so thats an area where Gear6 is focused, and, again, it seems to be a very strong complement to what the persistent storage vendors are delivering today.
Vizard: What do you think the ambitions of the networking companies are going to be in this space as it applies to storage? You hear Cisco increasingly talk about storage and, you know, your appliance is basically embedded in the network as well. So, what do you think will happen there?
Orenstein: I think youre going to see a recognition of the networking companies that they inevitably will want to move closer and closer to the data. All data moves over some type of network. Today, were seeing a lot of activity in the networking market around things related to wide-area file systems and wide-area acceleration, essentially tackling the problem of Internet latency with various caching, and optimization and compression mechanisms. And, in addition to a number of very successful emerging companies in that area, all of the large networking companies including Cisco, and Juniper and others, have gotten into that arena. As well, a number of companies have begun to break into the data center in an area that I would call more on the Web-serving and load-balancing arena. Juniper, and F5 and Cisco all make products that sort of help bring the data center out to the Internet and the Internet into the data center. As those areas become more developed, youll see a further penetration into the data center, and that could be by the networking companies. It could be by the larger storage companies, to now tackle the problem of disk latency. So, once the Internet latency issue has become relatively solved or relatively mature, the next stop is going a little bit further into the data center and tackling the disk latency problem, which, of course isnt going away. And things like the arrival of a terabyte drive arent helping the equation; theyre only exacerbating this problem of what we call the server storage-performance gap. I think you will see a lot of interest in people penetrating further to accelerate the end-to-end process, whether that be the user at the end of a Web service, across the Internet, all the way into the data center to that core data set, whether theyre delivering some piece of information from a database, or video or music clip from a content library.
Vizard: So, in that model, then, we can probably see some convergence between the networking and storage vendors. And who knows, we might ultimately see a merger between mega partners like Cisco and EMC at the rate were going.
Orenstein: It might be, and maybe Gear6 as well.