Jon Burt is senior manager of software configuration management for the QuickBooks product line at Intuit, in Mountain View, Calif. Burt spoke with Technology Editor Peter Coffee about his use of technology from Electric Cloud, also of Mountain View, to accelerate application builds.
What issues led you to explore new application build technology?
Let me describe the situation that we had two years ago. We have a very large code base. For an individual to build it, end to end, could be a 2-hour process on a workstation. Generally, engineers [didnt] want to take the time: The overnight build would fail because of compile-time errors.
My team would spend the morning figuring out who did what to who, solving that and then restarting the build—which showed up about [1 p.m.] every day.
What we currently have is a continuous build system. A little demon [process] monitors the different branches that people are working on; when it notices changes, it checks out all the code and launches a build.
The compile and link process runs on the Electric Cloud machine—which is able to turn this around in about 20 minutes. At that point, if its a fail case, we send a message to the people that checked in code between the last run and this run; within 1 hour of that message, you need to correct it or [withdraw] your change. We then have a build that we can send through a basic acceptance test.
The developers get the feedback very quickly so they can correct the mistakes they had and have confidence that what they checked in not only compiles and links, but executes correctly through mainline paths.
What kind of hardware have you dedicated to that build process?
I think the purchase was about a year and a half ago—a set of 1U [1.75-inch] units, 50 of them—single-CPU with one disk drive, fairly fast with a gigabyte of memory. [There are] two racks of 1U boxes, one cluster of machines [at] $800 a pop for each box. We threw cheap hardware at it.
Electric Cloud has told me about their instrumented virtual file system that detects dependency violations: Did you find that it learned your applications architecture quickly? And is that a continuing benefit, or is your architecture stable enough that its more of a one-time learning process?
We find that its a continuing advantage. We have multiple release branches that are still in support. Those branches that have already been released might have minor bug fixes or customer issues, and we would not see changes in dependency. The next-year release, especially when were in heavy development times, we would see changes there. The system handles that automatically behind the scenes.
Do you have to debug build scripts?
No, and thats just great. And the build script that the developers would use is the exact same script that Electric Cloud uses, 99 percent of the time. Occasionally there might be some nuance in a make file that might throw Electric Cloud off; its not a 100 percent solution, but 99 percent is fine by me.
Ive also seen Electric Clouds visualization tools for looking at build interactions. Have you applied those?
If we suddenly see build time increase by 10 minutes, well use that tool and discover that a make file has been changed—that a developer has produced a dependency that [apparently] has to be serialized when thats not true.
So, when the system performance deteriorates, is that because the Electric Cloud system is applying a rule of “better safe than sorry”?
Its either irrelevant or actually incorrect, yes.
Is there any new work, from your point of view, associated with having the Electric Cloud system in place? Does it create any new tasks for you?
Theres maintenance of the [build cluster] hardware itself: One of the things we did was to rev from the 2003 compiler to the 2005 compiler. We had to schedule that across all 50 nodes. Theres that aspect.
When we did our first deployment, Electric Cloud wasnt making very good use of multiprocessor machines, so we were going with the cheapest hardware with one CPU. They have now beefed up their software to take advantage of some of the newer chips that are now on the market—dual-core and all that good stuff. If I had to purchase the cluster again today, Id be buying different hardware.
Would you consider todays sweet spot to be multiple CPUs?
Dual-core, dual-CPU would probably be where Id put my money today.
Are you looking at those higher-density CPU chips in terms of performance versus cost, or also because of performance per watt or performance per square foot considerations?
If I had half the number of machines, my maintenance—just keeping all the machines at the same rev level—would be lower [compared with] the cost of just owning so many boxes. [Few machines] should put out less heat and consume less power.
Are there things youre looking toward in the next release of the Electric Cloud platform?
Yes, theyre beefing up the ability to use the multiprocessor units. At some point, I hope well move to multiprocessor units.
Is that efficient use of multiprocessors more the responsibility of the operating system than of something above that level?
On each of the nodes in the cluster, there is an agent that runs, and originally you could have one agent. Now you can have multiple agents; there were some changes to their software. The agent is running at the application level. So there were some changes to the underlying file system to support multiple agents running on the same cluster node.
What does your build farm look like to your compilers and other development tools?
It looks like anything else—they dont even know that [Electric Cloud] is running. Were running Windows XP.
Do your developers see anything at all other than the typical Visual Studio build interaction?
Its completely invisible. To use the cluster effectively, you have to be on a high-bandwidth connection. We get zero benefit from executing a Boston build on a Mountain View cluster because of the amount of bandwidth that would be consumed.
So, whats now heading your top-problems list?
Better efficiency of testing, and moving more of our development to [Microsofts] .Net.
Is Electric Cloud working specifically on optimizing the process of building large .Net applications?
Yes, were getting specific support for building .Net assemblies. Weve made that request, let me put it that way: Its a brave new world.
When you talked about efficiency of testing, did you mean the time to run the tests or the specificity of the results?
More a matter of making testing easily accessible so that remote teams can run tests before checking their code in. Its a matter of best practices, letting developers run tests against their own code to make sure they havent broken something. Design, build, test; design, build, test. I think it relates well to the rapid prototyping and Extreme Programming philosophies.