Open-Source Projects Target Dispersed Storage Grids, Backup

Enterprise-level storage finds the LinuxWorld spotlight with Cleversafe, a storage grid technology that automatically disperses data in slices across the Internet, and Zmanda, the commercial version of the longstanding archive and backup software project,

SAN FRANCISCO—With a RAID Level 6 demonstration seemingly on display at every corner, enterprise storage is making itself very evident here at the LinuxWorld Conference & Expo. At the same time, several storage-centric open-source community projects (and their commercialized siblings) look to challenge the established order in backup and redundancy.

On the LinuxWorld floor were the Cleversafe Project, a new wide-area, dispersed storage grid that appears to hosts as a mountable drive, and Zmanda, the commercial version of Amanda (Advanced Maryland Automatic Network Disk Archiver), an open-source network backup and archive system.

Cleversafe currently comprises two projects: Cleversafe Dispersed Storage, the storage grid, and the DSGrid File System, which lets the grid present itself as a mountable file system for Linux-based applications.

Cleversafe uses information dispersal algorithms developed for the project that slice data into pieces. Along with the data slices are "coded slices," which contain parity values that can be used to rebuild the entire original piece of data. These sets of slices, called Storage Slices, are dispersed across the Internet in different locations.

When the stored data is called up, the slices are retrieved from the grid. However, not all the slices are needed; a majority of the sets can recreate the data. For example, in an 11-part grid, only six Storage Slices will be needed to recreate the data.

According to project members, the dispersed architecture improves data security, privacy and storage costs. Unlike the usual backup architecture, where entire copies of data are put in backup sets and moved about, the information in the dispersed Cleversafe slices cant be used or understood by themselves. The slicing technology itself provides off-site redundancy as well as some degree of privacy and security.

"With copy-based storage, you have the trade-off that more reliability means less security and more cost. With dispersal, you can engineer your level of reliability and it doesnt increase cost because you dont actually store more data, you just disperse it more," said Chris Gladwin, president of Cleversafe, the Chicago-based company that expects to commercialize the technology as a service.

/zimages/2/28571.gifStorage networking managers are banding together for information and sometimes career survival. To read more about the SNUG movement, click here.

Of course, the performance threshold in this case becomes the speed at which the data can be pulled off the Internet or network. With new higher-speed extensions to TCP/IP on the horizon, that should only improve the potential performance of the Internet storage grid, he said.

Gladwin said that a future version of the software will poll the storage sites at intervals and determine if its faster at any given moment to wait for all slices to come down the pipe or to retrieve fewer slices and rebuild the data using the parity code.

According to Gladwin, the calculation overhead is minimal.

"The IDA is all modular arithmetic, which means additions and subtractions—things that computers do real fast. In other words, the dispersal and recreation of data happens in real time. Its faster than wrapping or unwrapping the packet," he said.

The first test version of the software was released in April. A demonstration grid built using beta software, is currently available for research purposes, Gladwin said. It uses 11 hosting points in North America.

Next Page: Testing is a challenge.