Designing a Grass
-Roots Supercomputer"> Designing a grass-roots supercomputer required some equally basic, fundamental rethinking of the software. In any cluster, some of the key problems are combating latency and keeping all of the computing nodes in sync. "Some of the hardest questions to deal with included the software and modification of the NPI stack," Witchel said. "Maximizing the bandwidth on a 10-Gbit LAN requires very little work, as there are millisecond latencies. FlashMob is designed to work on 10/100-Mbit LANs."The first generation of FlashMob software is limited to 1,200 Ethernet ports, Witchel said, but the team plans to enhance its capabilities through subsequent revisions. Witchel co-designed the software in conjunction with Pat Miller, computer scientist at the Center for Applied Scientific Computing at Lawrence Livermore National Laboratories, and Craig Benson, a professor at USF. Witchel said the software, based on a modified Linux kernel, would form the basis for his thesis. Witchel and the others tuned standard supercomputer libraries such as MPI for FlashMob and wrote original code to facilitate bootstrapping the PCs, reporting in real time, doing on-the-fly network and node diagnostics and optimizing ad-hoc performance. On April 3, USF will invite researchers and guests to stop by its gym to attempt to put FlashMob I, as the ad-hoc supercomputer will be known, into the list of Top 500 supercomputers as measured by the LINPACK benchmark. Users have already committed 600 or so PCs to the project, Witchel said. After FlashMob I is completed, Witchel hopes to use the software for other research purposes and social causes. After a universitys lab has closed for the night, the FlashMob software could be used for scientific research, for example. Witchel also said he hopes to take the project to AIDS Walk San Francisco and the Avon Walk for Breast Cancer, where users could drop off their PCs in the morning, participate in the event and pick them up when they leave. Network setup for FlashMob I will begin April 2, while the actual FlashMob will occur April 3 at the USF gym. Doors will open at 8 a.m., with arrival times staggered throughout the morning and assigned to participants when they register. Check out eWEEK.coms Server and Networking Center at http://servers.eweek.com for the latest news, views and analysis on servers, switches and networking protocols for the enterprise and small businesses.
To solve the latency problem, Witchel said, a large network can be broken up into smaller subnets, keeping the network traffic confined to a small section of the network. He also said he found that diskless processing allowed easier checkpointing. Checkpointing, a feature of high-end clusters, allows a computational process to be saved and restarted by the management PC or node. In the FlashMob software, the diskless model allows each PC to compute longer processes, reducing the overall network traffic and the strain on the network switch, Witchel said.