-It-Yourself Grid Computing"> Eric Bremmer, a professor at Children Memorial Hospital at Northwestern University, told about grid-enabling data and text mining for the systems on which the development for a biology knowledge base happens. His is a small research organization attached to the hospital, but he needed to integrate some 150,000 articles from five years worth of 20 medical journals.With the help of United Devices Inc., Bremmer got a small grid up and running. He said hes managed to decrease analysis time, going from 5,000 articles in 24 hours to about 100,000 articles in 24 hours. Hes using the same computers that are either used by administrative staff or put to work on other research projects during the day, running the grid analysis work from 7 p.m. to 7 a.m. so as to stay out of administrators hair. Turning to a service provider is the only resource for somebody like Bremmer, who has two research assistant professors and some post-docs, all of them medical types and none of them computer science types. Another thing you need to worry about is whether youre in a regulated environment, he said. "Weve tended to move toward commercial software because we need to have it FDA regulations-capable," he said. "And most open-source software is not because you cant lock it down, by nature." So you get some service providers in-house. But what sort of skills do you need to lead the project? Wolfgang Gentzsch, a member of the GGF steering committee, a coordinator at D-Grid and visiting scientist at the Renaissance Computing Institute at UNC/Chapel Hill, said you have to know enough to define the different steps of projects; to watch over the service provider, who will be bringing in various parts of the project; you have to measure the projects success; and you have to know enough to report back to the next level of management. But, Bremmer said, the training youll receive in the process is "remarkable." What are some of the pitfalls companies get into when they do grid themselves, without service providers? DataSynapses Director of Business Development Dave Maples said his company often gets called into projects where somebody has attempted to build some kind of clustering or load-balancing project that they then proceeded to outgrow. That means the grid wont scale, it didnt have enough power, and/or it didnt do resource identification very well. Bremmer pointed out another issue: In a research lab environment, the biggest problem is when a post-doc or somebody leaves. "To have a tool built by that person [means that] typically the knowledge leaves with that person," he said. Next Page: Organizational and security aspects of grid computing.
The problem was that more than 24 hours were needed to process about 5,000 articles on a single desktop computer. Another problem is that this stuff is time-sensitive: Scientific literature goes out of date rapidly, so youre only as good as your last update.