Scaling Toward the Petabyte

By Matthew Hicks  |  Posted 2002-06-17 Print this article Print

Applications must be capable of handling huge warehouses.

If theres any doubt that enterprises are facing a data explosion, consider the next major threshold on the horizon for large databases. Within 18 months or so, experts say, a company or organization somewhere in the world will reach a petabyte of storage for a database.

More than likely, it will be for a massive data warehouse, not the transactional systems running day-to-day operations, and it likely will fall within retailing, financial services, health care or government. With the petabyte—1,000 terabytes—will come a growing need for data management systems that can scale even larger, perform even faster and manage themselves more automatically. "Theres not an intrinsic technical meaning to the milestone," said Richard Winter, president of Winter Corp., in Waltham, Mass., a large- database consultancy. "Its something that makes people ... take notice."

It will also force database vendors—no matter how much they say their systems are ready—to improve their software, Winter said. Vendors such as Teradata, a division of NCR Corp.; IBM; and Oracle Corp. already are focusing on key areas crucial to managing such huge databases. Data management systems must be able to scale and grow rapidly through technologies such as clustering, they must be able to maintain top performance even as demands on them increase, and they must provide self-management capabilities to avoid unruly database administrator demands, Winter said.

IBM, for one, will continue adding more self-management capabilities to its DB2 Universal Database, planning new features later this year with Version 8 and having launched new self- managing database tools earlier this month. Such advances will be critical to support databases reaching the petabyte range and maintaining high availability, said Jeff Jones, director of strategy at IBM, in San Jose, Calif.

Perhaps the largest data warehouse today belongs to Wal-Mart Stores Inc. Teradata, Wal-Marts database vendor, estimates that the warehouse can hold 200 terabytes of data today. Winter said the figure includes the databases storage capacity, not the total amount of data in it, which is closer to 50 terabytes. Often, to improve performance, large data warehouses need more storage capacity than the actual amount of data collected. Still, at that size and with leading-edge companies doubling the size of their data warehouses every year, Winter predicted the first data warehouse with more than a petabyte of storage will be in production in 2004.

Teradata Chief Technology Officer Stephen Brobst agreed, saying he expected a customer to deploy a petabyte database within 18 months. Brobst, in Dayton, Ohio, said the holdup isnt technological but rather customer readiness. Teradata this month certified the database components necessary to build a petabyte database, Brobst said."Its all about how much detailed information can you have that makes economic sense," he said.

Enterprises such as CNN, a division of Turner Broadcasting Inc., are eyeing a petabyte. CNN has used IBMs DB2 to underpin an archive of its news footage. The archive still is in its early stages of transferring some 1.2 petabytes into the system, which could take at least five years, said Gordon Castle, senior vice president of CNN technology, in Atlanta. Within a year, Castle expects to be archiving as much as 240 terabytes of data every year from new footage. Technically, the system wont be a full-fledged petabyte database because the database itself wont store the data objects but will point to them through metadata, he said. One of his biggest concerns is making sure the system can continue to perform well with fast query responses as demands increase. The archive wont be static but will instead be accessed daily.

"This is somewhat heavy lifting for a database, and theres lot of access and controlled movement of content, and the files are quite large," Castle said.

Even after it is reached, a petabyte database will remain the exception for most companies years beyond its unveiling. Most enterprises dont have compelling-enough business reasons to build and manage such massive databases. United Airlines Inc. is a good example. The airline is building a 6-terabyte data warehouse but has been taking a slower-growth approach by focusing more on analyzing real business issues than on the warehouses sheer size, said Casey Hossa, director of sales, marketing and call center technology at United, in Elk Grove Village, Ill. "Theres a lot of [data] you can capture, but were ... trying to find those things that are core to the organization," Hossa said.

Matthew Hicks As an online reporter for, Matt Hicks covers the fast-changing developments in Internet technologies. His coverage includes the growing field of Web conferencing software and services. With eight years as a business and technology journalist, Matt has gained insight into the market strategies of IT vendors as well as the needs of enterprise IT managers. He joined Ziff Davis in 1999 as a staff writer for the former Strategies section of eWEEK, where he wrote in-depth features about corporate strategies for e-business and enterprise software. In 2002, he moved to the News department at the magazine as a senior writer specializing in coverage of database software and enterprise networking. Later that year Matt started a yearlong fellowship in Washington, DC, after being awarded an American Political Science Association Congressional Fellowship for Journalist. As a fellow, he spent nine months working on policy issues, including technology policy, in for a Member of the U.S. House of Representatives. He rejoined Ziff Davis in August 2003 as a reporter dedicated to online coverage for Along with Web conferencing, he follows search engines, Web browsers, speech technology and the Internet domain-naming system.

Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Rocket Fuel