Like the rising ocean tides expected with global warming, a creeping level of data could overcome the worlds storage capacity in the next few years, warns an analyst report. Even more worrisome to some will be the lack of comprehensible illustrative devices that will let us picture this growing level of storage.
The origin of this column was a long-overdue cleaning of my home office and an investigation of the dark recesses of my office closet. There are only so many times that you can simply buy another box or rent another storage compartment. (Well, there isnt really such a limit—however, I finally reached the budget limit on keeping all this stuff around.)
Hauling out stacks of boxes, I dragged plenty of “treasures” into the glare of a hanging compact fluorescent bulb.
The tour down memory lane included a SCSI-1 hard drive from the late 1980s (the size of a large toaster) and a jumble of cables that go along with it (sigh, they dont just make cables like the 50-pin Centronix anymore); stacks of floppy diskettes with no working drive in sight; and an Iomega Jaz SCSI-2 drive, which used removable hard disk cartridges. I dont even remember using the Jaz but I must have. I have some 1GB carts for it.
I even found a 1GB cartridge for the Datasonix Pereos drive, which was a portable tape drive with teeny-weeny cartridges. How small? Seeing is the only way to believe.
So, I appear to be rich in old data. But Im really not so encumbered, since I dont have ready access to this data. Or any access to the data, when we get down to it!
Ive decided that its way too much trouble to scrounge through this virtual junk heap in order to cull out a few scraps of personal and business history. For the moment, I will forget the search for the device drivers, the power adapters and the host bus adapter cards that would be needed to even start this investigation.
So, I will live with my policy: Either I migrated the data from these devices in the past and its buried deep somewhere in an old but perhaps accessible archive, or I didnt and the data is gone for good.
These drives and the associated stacks of media reminded me of an IDC white paper titled “The Expanding Digital Universe: A Forecast of Worldwide Information Growth Through 2010.” (PDF) The study came out in the spring and builds on the “How Much Information?” research done by a team at the University of California at Berkeley. Both the new report and the previous studies were funded by EMC.
The earlier research looked at all information sources, including paper, film, recordings and digital data. As it says in the title, the new report predicts big data growth ahead in the coming years. And now.
The report lists a number of reasons for the growth: the increasing adoption of digital devices such as cameras, media players and digital television; the shift from analog workflows to digital ones, especially in the small business market; the increasing number of portable digital devices that can handle e-mail, voice and IM traffic; and the trend towards quality, which boosts file sizes.
The total amount of data created annually will grow from 161 exabytes in 2006 to 988 exabytes, or almost a zettabyte a year. While much of the growth in data will be in the consumer sector, the enterprise will touch much of this data, the report predicted.
For the enterprise and now smaller businesses, compliance requirements will continue to drive storage growth, both for near-line storage and for archival storage, it said.
The report targets a number of areas driving growth in enterprise storage:
- Enterprises will need to store a vastly greater traffic in IM. The estimate is some 250 million accounts in 2010.
- The report assumes that VOIP (voice over IP) will be integrated into business networks and that the digital phone calls will need to be included in compliance management systems.
- Enterprises will increasingly serve and store a greater range of rich-media content, such as Web conferences and plain audio and video content in podcasts and videocasts.
- Organizations will increase the use of digital surveillance cameras and store the resulting video images.
“But whether this information gets stored permanently or not, it will be transported over networks, shuttled from switch to switch, stored temporarily somewhere, and otherwise require use of networking and storage infrastructures, both those in organizations and those in carriers, hosting firms and other digital information service providers,” the report said.
Looking at the same time frame, IDC added up the predicted growth rates for all the different storage media and calculated a shortfall. It said the “media available to store the newly created and replicated bits and bytes of the digital universe will grow 35 percent a year from 2006 to 2010, or from 185 exabytes to 601 exabytes.”
Until this year, capacity has outstripped the data. However, if IDC is right, we will quickly develop a capacity gap over the next few years.
But do the assumptions add up?
How Big Is All
All this worry over a storage shortfall must be taken with a large spoonful of salt. These predictions are based on a good number of “what-ifs” and dont seem to take into account any change in technology, behavior or regulation.
For example, the assumption that compliance requirements will extend to VOIP seems a stretch. Of course, its possible to collect information, but to analyze it and pull anything useful from it would be a nightmare.
In addition, I keep reading that the current legislative trend regarding Sarbanes-Oxley compliance is toward loosening regulation, not increasing its strictures, whereas the IDC report expects that everything that can be collected and archived will be.
Besides, wouldnt there be a market solution to this problem?
If storage becomes scarce, then the cost for it will go up. Consumers and enterprises alike will find ways of extending their existing resources through data deduplication or stricter storage policies.
Weve lived through an era when areal density of HDDs has grown at a breakneck pace, providing plenty of room for every duplicate file or terrible, blurry photo. The solution has been to keep buying more storage.
Just as with my experience this past week hauling out the ancient storage from the closet, consumers will just toss out the bad photos and the duplicate audio files, and then be more selective when keeping any new data.
If storage costs rise, better calculations of IT and storage costs will be needed at budgeting time. Whatever the outcome, the extra costs will be added up and passed along to the business end user and consumer.
According to the reports section on assumptions, it was conservative in calculating certain storage areas, such as some sensor data and music files.
“We estimated the number of legal song sales (CD and Web distribution) and added a conservative estimate of songs illegally distributed. It is quite possible that we were too conservative in our estimate of illegally shared songs over peer-to-peer networks,” the report stated.
So, it could be a bigger problem. Or not.
While the IDC report was no Harry Potter and the Growth of Enterprise Data, it had lots of fun sidebars and colorful charts. One interesting section offered some physical illustrations of how much data this all comprises.
For example, the paper said the current stored data in book format would equal 12 stacks of books extending from the Earth to the sun. Or a single stack looped twice around the Earths orbit of the sun. (It didnt specify whether this stack was of the paperback or hardbound edition.)
“By 2010 the stack of books could reach from the sun to Pluto and back. In 2006 those books would represent about 6 tons of books for every man, woman and child on Earth. A large adult elephant weighs about 6 tons.”
Ive never liked these distance comparisons, mainly because theres no way that they can be imagined any more. How can you compare a book in your hand (paperback or hardcover) and then distance from the Earth to the sun? Or the bookshelf that would hold that stack of books floating around the orbit of the Earth?
Perhaps a better way to imagine the digital universe would be to compare volumes, weights or liquid measures. If a megabyte of data is a shot glass of bourbon, what represents all the data in the world?
Check out eWEEK.coms for the latest news, reviews and analysis on enterprise and small business storage hardware and software.