Taking Care of Data: ILM Gets Down to Business

By Brian Fonseca  |  Posted 2006-03-26

Taking Care of Data: ILM Gets Down to Business

Bill Graff, senior manager of infrastructure for CernerWorks, the remote hosting business unit of Cerner Corp., of Kansas City, Mo., is trying to outrun a data avalanche.

"Storage is a big budget-line item for us. Its significant because we do 1.5 petabytes of [spinning disk] storage today, and thats more than double what we had at this point last year," Graff said. "Thats the rapid growth of data were up against."

Graffs real challenge: Managing that data cost-effectively. To help maintain information ability, support SLAs (service-level agreements) and brace its storage architecture against the daily onslaught of data squeezing its capacity, CernerWorks is embracing a tiered storage methodology built on top of Hewlett-Packard technology.

Graff, who cant allow storage spending to increase beyond 16 to 17 percent of the companys shrinking technology budget, divvies up the companys data into five levels of storage tiers, much like a postal worker would sort mail into bins. Data is categorized based on whether the data is in production, serves a purpose such as meeting a compliance requirement and is expensive to keep.

This approach, known as ILM (information lifecycle management), will be a key topic during the Storage Networking World conference in San Diego April 3-6.

Hitachi Data Systems, a subsidiary of Hitachi Ltd., will announce "tiered storage in a box," sources said. The product will reside in HDS portfolio of midrange systems. The company declined to comment.

In addition, Compellent Technologies and OnStor will introduce an integrated SAN (storage area network) and NAS (network-attached storage) offering featuring automated tiered storage capabilities. The partnership will allow storage expenditures to be reduced by migrating data among multiple storage tiers.

Click here to read a review of the OnStor 2260 NAS Gateway.

Meanwhile, as a key steppingstone to achieving data classification critical to ILM, StoredIQ will announce Version 3.6 of its ICM (Information Classification Management) 5000 information server, which offers new four-node and eight-node cluster options to push distributed file- and context-based classification.

And, in the next two months, HP will get into the ILM act via its recent acquisition of OuterBay—a database archiving provider—and the first fruits of its relationship with Mendocino Software to incorporate CDP (continuous data protection) technology.

HP will unveil a new CDP product that is based on a redesigned version of Mendocinos RecoveryOne CDP appliance, which HP currently resells as part of an OEM agreement signed last year. HP also is expected to make an announcement regarding its HP StorageWorks RIM (Reference Information Manager) for Database Archiving offering based on OuterBay technology. Sources say HP will tightly integrate the Mendocino and OuterBay technologies to simplify ILM implementations.

Read more here about HPs plans for continuous data protection.

Why all the attention for ILM? When it comes to storage, technology managers such as Graff; Marty Colburn, chief technology officer and executive vice president for the National Association of Securities Dealers; and Joe Furmanski, lead technology architect for the Information Systems Division at the University of Pittsburgh Medical Center, have two choices—spend more on storage or become savvy about how data is managed.

Content, of both the structured and unstructured variety, continues to swell and become more sensitive. That data avalanche means its no longer economically viable to add more servers and storage devices to manage items such as e-mail and electronic documents.

Making matters worse, tight budgets and shrinking backup windows are putting the squeeze on resources often used in the past to handle mushrooming data growth.

As a result, organizations are being forced to become smarter about aligning storage and information needs. Thats opening doors for ILM to help separate data by multiple tiers based on its day-to-day importance.

"After you have [an ILM] solution in place, ideally it should allow a customer to purchase storage hardware more intelligently than they have in the past," said Charles King, an analyst at Pund-IT Research, in Hayward, Calif.

The problem with Kings view of storage utopia is that a large number of potential ILM components, such as marrying an archiving workflow with tiered storage architecture, do not yet exist.

To read more about tiered storage, click here.

"Were struggling with archiving and implementing real lifecycle data management because our customers say, I cant wait for 15 to 30 minutes for archive [off tape] to come back to me. I need it now," said Graff. "Its going to be critical that storage providers provide additional tools in the ILM realm."

Products alone, however, arent going to do the job. Companies also have to better link ILM with the business processes that route and use data. For financial services organizations such as the Washington-based NASD, ILM must address compliance and sift through, quickly sort and tag data pouring in from brokers, trade clearinghouses and other financial services entities.

Next Page: Looking at the full lifecycle of data.

Looking at the full

lifecycle of data">

To build an ILM environment, "you have to understand how mission critical the data is—how its going to be used," said the NASDs Colburn. "Will it be toward analytics? Reporting? How often will it be accessed? You have to understand the data usage across an organization. That becomes a critical point when you start to look at full data lifecycle management."

The NASDs handling of data is under especially tight scrutiny due to its role as the primary regulator of the U.S. securities industry. It regulates more than 5,100 brokerage firms, about 115,940 branch offices and about 657,800 brokers.

Colburn said the NASD relies on preset rules that require specific information from member firms to enter its IT systems at certain points in time. Once data arrives, its moved onto storage and fed into an operational view of ways the NASD uses the data and then moved depending on regulatory need based on reporting and analytics.

This process is driven by the timeliness of data. Most of the NASDs data is held at an operational level, or production environment, on Tier 1 storage. By comparison, Tier 2 is used to develop applications and mechanisms that involve a longer-term view of data, such as disaster recovery.

The NASD is running EMCs Symmetrix off its SAN for mission-critical data and Tier 1 storage, as well as EMC Centerra units for e-mail archival. For Tier 2 purposes, the organization uses EMC Clariion boxes for development and is currently evaluating Sun Microsystems StorEdge 6920 technology in the same environment.

Click here to read a review of the StorEdge 6920.

Balancing budgetary considerations with current and potential future ILM demands—along with factoring in business growth forecasts—can seem at times like walking a tightrope.

"From a budgetary perspective, you clearly have to look at what your growth looks like and how [information] is being used," Colburn said. "I think that begins to dictate how you slot [cost]. Thats what we look at. We look at regulatory needs [and] what were trying to accomplish. As a result, that dictates what our budget looks like."

Other companies are looking at ILM as a way to stretch limited budget dollars for storage. Despite sustaining an annual 59 percent data growth rate, UPMCs storage budget increase is capped at 25 percent. "Thats the budget they give us. What were doing and are expected to do, were not allowed to grow out," said Furmanski in Pittsburgh.

UPMC has inked an eight-year, $402 million agreement with IBM to help transform its enterprise server and storage architecture into an on-demand and ILM-based vehicle.

After some "housecleaning" to collect storage utilization rates on its AIX, Solaris and Windows platforms and sweep away seldom-used data, UPMC opted to move its storage systems to one large, unified SAN situated directly behind IBMs TotalStorage SVC (SAN Volume Controller) virtualization technology, Furmanski said.

"Were really just starting to understand and show how we look at storage differently now, in terms of how we manage it, provision it and share it across the enterprise," Furmanski said.

By masking areas of complexity within UPMCs storage infrastructure—featuring 350TB—SVC can more easily identify available resources and eliminate islands of storage that may have previously been swallowing vast amounts of data onto ill-suited hardware.

Originally, UPMC planned to run IBM TotalStorage DS8300 for electronic medical records, migration of management materials and human resources records; the DS6800 midrange disk system; and the DS4000 and DS4800 to comprise the third and final storage tier with low-cost SATA (Serial ATA).

But SVC has changed all that by allowing UPMC to set up a separate group of storage service levels that are no longer tethered to physical storage hardware or devices.

CernerWorks Graff said the ILM approach has given him more clarity on his storage management, but there is still work to do.

Created in 2000, CernerWorks client base had outgrown two data centers by 2004. The strain of supporting 100 health care providers had exposed holes in the business units SAN architecture and led to sprawling storage fabrics, complex interswitch designs and multiplying management challenges.

Click here to read more about HPs Medical Archiving Solution and its other offerings for the health care market.

CernerWorks Tier 1 storage runs on HPs XP1024 and XP12000 high-end disk arrays. The top tier stores "life critical" production information containing patient data thats always available. Its imperative that the Tier 1 storage boxes are able to receive platform upgrades without system disruption.

Tier 2 features HP EVA (Enterprise Virtual Array) 5000 and EVA8000 midrange disk arrays. The boxes store nonproduction copied data and can be used for development needs.

For its Tier 3 storage, CernerWorks is evaluating the HP Medical Archiving Solution to store offline images to house PACS (Picture Archiving and Communication System) technology. For backup, Tier 4 consists of HP StorageWorks 6510 Virtual Library Systems for rapid restore and recovery of seven days of backup and EMC Legato for slightly longer retention. Tier 5 offers HP and IBM tape products for long-term and off-site backup with disaster recovery in mind, Graff said.

One of the next big challenges for CernerWorks is to deploy a single centralized management suite to better manage its tiered storage environment. CernerWorks is evaluating storage resource management tools, such as HP Storage Essentials.

But even that wont get Graff to the finish line. CernerWorks plans to open two new data centers this year, followed by two more next year.

Check out eWEEK.coms for the latest news, reviews and analysis on enterprise and small business storage hardware and software.

Rocket Fuel