NAND Flash in the Data Center: 10 Potential Pitfalls to Avoid

 
 
By Chris Preimesberger  |  Posted 2012-02-27
 
 
 

Seamless Integration

Many flash memory products solve for narrow application areas. Some database accelerators require a deeper understanding and tuning effort to implement for a very specific part of the application stack. Beware of "specialized" storage, the kind that says it is specially designed for a particular use case. If the system is optimized for whatever that use case looks like today, what happens tomorrow when the software or system vendor comes out with the next release and things change?

Seamless Integration

Profit Per IOPS

If dollars per gigabyte was the only thing that mattered, no one would buy anything but tape. So why when buying high-performance systems, do people compare them based only on dollars per gigabyte? Because they assume that within a given storage tier everything provides the same performance. Servers, software licenses, people: They cost you the same amount per hour, regardless of how many I/OPS (I/O operations per second) they get, but their productivity per hour is a direct function of how many IOPS they get. The real costs are typically modeled at peak-performance requirements. When high IOPS are required and that requirement is sustained over time, the dollar cost will be inflated by systems that cannot sustain that performance over time.

Profit Per IOPS

Performance Issues of Consumer-Grade MLC

Multi-level cell (MLC) flash is an emerging technology, and to leverage for enterprise storage, high levels of redundancy are required. Higher latency-as compared to single-level cell (SLC) flash-puts more demand on controller architecture to distribute the IO to maximize and sustain the yield of MLC performance. The reason most vendors are hot on using MLC is that their controllers/shelf heads are very slow, compared with the flash that is the controller, not the flash that limits performance. Therefore, if they replaced their MLC SSDs with SLC SSDs, there would be little or no increase in performance.

Performance Issues of Consumer-Grade MLC

Application vs. Capacity Needs

Applications such as big data analytics, business intelligence, simulation and decision support are demanding more performance out of storage. In-memory applications are very expensive to scale, but Tier 1 flash memory systems can reduce that cost significantly by providing memory-like latency with enterprise storage capacity. As applications scale in the data they consume and produce, capacity alone is a less relevant characteristic, while performance, reliability and sustainability (power, cooling and space overhead) become more relevant.

Application vs. Capacity Needs

Real-Time Analytics

Real-time analytics are typically ad hoc user-driven queries. As such, you want to have a consistent user experience (i.e., 99 percent of response times under 5 seconds). Disk-based performance is much less deterministic, compared with flash memory, which results in over-provisioning for nonpeak loads. A higher percentage of time out of peak means less return on that cost.

Real-Time Analytics

Quality of Service

Flash storage needs to provide deterministic performance as the mix of reads and writes changes. Availability is also tied directly into QOS. Hot-swappable components with hot spares provide a higher class of QOS for mission-critical apps.

Quality of Service

Fault Tolerance

Fault-tolerant server flash costs you in more ways than one. Not only do you need two to three times the storage locally, but you also need it at your replication sites. Worse, the latency and bandwidth promise of server flash is lost once all writes have to be synchronously mirrored to one or more local servers-even if remote replication is asynchronous. Now, your write latency is worse than storage-area network (SAN) storage, and your write bandwidth is a fraction of the size of your network port. If you have to bring the system down to replace a failed component, that can impact the availability of your application. Typically, this means you need to double or triple the number of systems if those systems do not have hot-swap capability for the replacement of components.

Fault Tolerance

Refresh and Replacement

Refresh cycles and capacity/capability increases never end. With server flash, if you need more capacity, you have to replace what you have or increase it by whole multiples. You can't just buy an extra 25 percent each quarter. When it's time to refresh your servers, you have to refresh the server flash on the same schedule, and retire your existing storage along with the servers holding it. If you're adding high-performance servers to solve a problem, you have to buy storage just for the new servers. If your flash is in SAN-attached arrays, when you add more servers, you don't have to add storage, and when you add storage, you don't have to add servers. When you refresh servers or storage, you can do them separately on different time schedules.

Refresh and Replacement

Commodity Hardware Infrastructures

Not all flash devices or systems are the same-or support the same protocols and standards. The fit with existing infrastructure, support for interconnects with appropriate performance and the needs of shared storage must be taken into account.

Commodity Hardware Infrastructures

Hidden Costs of Dedicated Storage

Peripheral Component Interconnect Express (PCIe) cards in servers are never filled. That's because a) the performance is so bad if they are; b) the app storage requirements in a large data center never evenly fill the local storage of each server; and c) you pay for the storage, even if you don't use it. Disk arrays that have to be partitioned or dedicated to certain apps (because of disk contention) cost you more because you still paid for the storage you aren't using, as well as for the extra arrays and licenses that go with them. Flash has no contention issues, and when it's networked, you can share it fully and get the use of everything you paid for.

Hidden Costs of Dedicated Storage

Rocket Fuel