SAN DIEGO—When MySpace.com quietly launched its online social networking service in September 2003, few could have predicted that in less than three years the tiny social company would explode into an Internet behemoth currently featuring 65 million users and running 4.5 million transactions per minute.
Running parallel to MySpaces meteoric growth in terms of scale and data storage needs is its ascension to the rare pantheon of cultural phenomena.
The site is adding 250,000 new users each day. Its a safe bet that someone in your family, group of friends or circle of co-workers has not only heard of MySpace but is likely an active user of the Web site.
However, sustaining that type of unprecedented growth, while simultaneously enabling systems to endure the onslaught of members putting their audio, video and image files into MySpaces storage and database systems, requires careful forecasting.
In fact, to hear Aber Whitcomb, chief technology officer of the Santa Monica, Calif., company discuss the subject at Storage Networking World here the week of April 3, his companys skyrocketing popularity and success can be directly attributed to making smart choices about IT infrastructure and inevitable capacity constraints, flexibility planning, and understanding precisely when it makes sense to move off a dependent technology that has outlived its usefulness.
“The theme of our history is we max everything out. So we truly need a scalable architecture in order to handle that,” Whitcomb said, adding that MySpace plans to be at 100 million users by January 2007.
With a primary age demographic between ages 14-34, Whitcomb said, MySpace.coms user snapshot includes trendsetters, music and film buffs, and gamers, who have built their own massive online community. The MySpace Web site absorbs 1.5 million new images each day and has stored 430,000,000 million total images.
MySpaces extensive IT architecture currently features 2,682 Web servers, 90 Cache servers with 16GB RAM, 450 Dart Servers, 60 database servers, 150 media processing servers, 1,000 disks in a SAN (storage area network) deployment, three data centers and 17,000MB per second of bandwidth throughput.
In the earliest days of building the MySpace juggernaut, Whitcomb said, data accumulation rapidly outpaced the storage capacity and servers necessary to process transactions, and, just as important, the software needed to make the entire operation less taxing on underlying hardware.
In the beginning, MySpace.com featured a two-tiered architecture of a single database and load balanced Web servers. While that configuration proved great for rapid development due to its lack of complexity and lower cost for fewer hardware components over multiple sites, it proved ineffective for higher traffic. At the 500,000-user mark, Whitcomb knew a change needed to be made.
“We realized a single database wasnt going to cut it. We maxed out our database on the back end. The first thing you try to do is tune all your queries, split reads and writes across separate databases,” and use transactional replication so multiple databases can service required reads, Whitcomb said.
At 1 million users, MySpace embraced vertical partitioning, enabling different features for different sites. For instance, this included putting e-mail on a different server using transactional replication. However, that method didnt work for all workloads and data types.
Once MySpace barreled past the 2 million-user mark, a bigger problem occurred: “[W]e were realizing we were having disk problems. We used SCSI arrays and [encountered] reliability and performance issues. We didnt have enough disk to handle I/O requirements,” Whitcomb said.
The decision to move data over to a SAN-oriented environment paid immediate dividends toward improving uptime, performance and redundancy, he said. It was then that MySpace shifted its database operations onto an EMC Clariion array.
But in the bursting-at-the-seams data realm that is MySpace, soon vertical partitioning became a less attractive format for parsing data. So at 3 million users, MySpace rearchitected its database and turned to horizontal partitioning for its back end.
The decision paid off, but Whitcomb admitted that horizontal partitioning is a difficult task to undertake while systems are in production.
At 10 million users, MySpace realized it couldnt ascend greater data heights without scalable back-end storage. While the online company did have disks in its SAN assigned to certain databases, troublesome hot spots were created on those disks, and once the disks were maxed out there wasnt much else that could be done to recoup capacity. The answer: storage virtualization and high-performance block-level SAN access from 3PARData.
“We decided wanted to go with storage virtualization to create a software layer in between disk and host; then you can create a stripe across all those disks and have each database take performance of that whole RAID group. This really, really helped us and eliminated hot spots across our architecture. We went with 3PAR for this,” Whitcomb said.
MySpaces fastest growing area, static content such as images, MP3s and videos, was examined at the 30 million-user mark by closely monitoring access and performance and plotting rates versus demand, Whitcomb said. Another roadblock sprang up because traditional storage is not well suited for content and is not easily managed.
MySpace currently sets aside about 100 terabytes for MP3s and videos, and another 200TB for dynamic content.
MySpace started with SATA (Serial ATA) RAIDs with hosts attached to them, and would put as many individual files as possible on those servers. Unfortunately, this created islands of storage, and once the storage hardware is maxed out its very difficult to move data onto another box, Whitcomb said. “I would have engineers up all night. So we needed something different, we needed storage that truly scales…so we brought in Isilon.”
MySpace is deploying Isilon Systems software for MP3 and video streaming, clustering systems together in order to spread files and data across multiple storage nodes. The technology also reduces storage capacity constraints, since new nodes can be added as necessary. Originally starting off with a two-node 3PAR frame, MySpace has since upgraded to an eight-node cluster. Each storage node delivers 600 megahertz per second, while each cluster spits out 10G bits per second.
With plans to launch in multiple countries in the future—Whitcomb was in China last week to discuss how MySpace could coexist with that countrys strict online policies—the ceiling is still sky-high for the companys growth and IT system expansion.