University of San Francisco researchers are attempting to create a poor mans supercomputer cluster by linking desktop PCs and notebooks together through a conventional Ethernet network.
Part social movement, part scientific effort, the “FlashMob” software is named after the short-lived social phenomenon of random people coming together for brief activities. John Witchel, a USF graduate student and FlashMob co-creator, said he plans to make the software open-source. In about a week, USF will host about 1,000 people in its gym to demonstrate the technology, with a goal of cracking the Top 500 list of supercomputers.
The software is designed to put supercomputing back into the hands of the people. Massive clusters such as ASCI White and the Earth Simulator in Japan cluster hundreds of nodes together, at a cost of tens of millions of dollars.
“What this really means is that supercomputers are not available for use by ordinary folk,” Witchel said. “If you and I want to study the hole in the ozone layer, the answer is, we cant.”
The FlashMob software requires no operating system. Instead, the software is burned to a CD-ROM disk, which then controls the PC when it is rebooted. A user then plugs into the local network, and the software begins scanning for other FlashMob clients. For security reasons, the hard drive is not accessed at all, Witchel said, preventing personal information from being leaked to the network and also preventing the spread of viruses and other worms. When a user completes a task or must leave, all he or she needs to do is disconnect the PC, remove the CD-ROM and reboot.
FlashMob fits somewhere in between a clustered supercomputer, which uses modular computing nodes or blades coupled with extremely fast interconnects, and distributed computing , where a server assembles chunks of data processed by PCs that are linked through the Internet, such as the [email protected] project.
FlashMob is perhaps most similar to the 1,100-node cluster of Apple Macintosh G5s, later replaced by Apple Xserve servers, that Virginia Polytechnic Institute and State University began assembling in August 2003.
Whats the difference between FlashMob and the Virginia Tech cluster? “About $5.2 million,” Witchel replied. “Virginia Techs effort was very impressive, and FlashMob would not exist without Virginia Tech leading the way,” he said. “But youll recall they built a specific building for it, complete with a cooling system.”
Designing a Grass
Designing a grass-roots supercomputer required some equally basic, fundamental rethinking of the software. In any cluster, some of the key problems are combating latency and keeping all of the computing nodes in sync.
“Some of the hardest questions to deal with included the software and modification of the NPI stack,” Witchel said. “Maximizing the bandwidth on a 10-Gbit LAN requires very little work, as there are millisecond latencies. FlashMob is designed to work on 10/100-Mbit LANs.”
To solve the latency problem, Witchel said, a large network can be broken up into smaller subnets, keeping the network traffic confined to a small section of the network. He also said he found that diskless processing allowed easier checkpointing. Checkpointing, a feature of high-end clusters, allows a computational process to be saved and restarted by the management PC or node. In the FlashMob software, the diskless model allows each PC to compute longer processes, reducing the overall network traffic and the strain on the network switch, Witchel said.
The first generation of FlashMob software is limited to 1,200 Ethernet ports, Witchel said, but the team plans to enhance its capabilities through subsequent revisions.
Witchel co-designed the software in conjunction with Pat Miller, computer scientist at the Center for Applied Scientific Computing at Lawrence Livermore National Laboratories, and Craig Benson, a professor at USF. Witchel said the software, based on a modified Linux kernel, would form the basis for his thesis.
Witchel and the others tuned standard supercomputer libraries such as MPI for FlashMob and wrote original code to facilitate bootstrapping the PCs, reporting in real time, doing on-the-fly network and node diagnostics and optimizing ad-hoc performance.
On April 3, USF will invite researchers and guests to stop by its gym to attempt to put FlashMob I, as the ad-hoc supercomputer will be known, into the list of Top 500 supercomputers as measured by the LINPACK benchmark. Users have already committed 600 or so PCs to the project, Witchel said.
After FlashMob I is completed, Witchel hopes to use the software for other research purposes and social causes. After a universitys lab has closed for the night, the FlashMob software could be used for scientific research, for example. Witchel also said he hopes to take the project to AIDS Walk San Francisco and the Avon Walk for Breast Cancer, where users could drop off their PCs in the morning, participate in the event and pick them up when they leave.
Network setup for FlashMob I will begin April 2, while the actual FlashMob will occur April 3 at the USF gym. Doors will open at 8 a.m., with arrival times staggered throughout the morning and assigned to participants when they register.