BALTIMORE-The engineers that take the Internet to its next levels of scalability will not be specialists, but generalists, according to the CIO of Google.
Speaking at Surge 2011, The Scalability & Performance Conference here, Ben Fried told attendees that generalists-people versed in multiple disciplines who are willing to learn even more-will be required for the industry to continue to produce enterprises that can reach Google-like scale and beyond.
Speaking from experience, Fried laid out a scenario of so-called “disaster porn” based on a situation he endured while running the IT operation of a “large multinational investment bank” where he used to work. A look at Fried’s profile shows that bank to be Morgan Stanley.
“Disaster Porn,” as he put it, is a hallmark of the Surge event, where engineers come to hear how others have overcome challenges to learn from their mistakes and successes. Now in its second year, Surge is put on by a small yet pivotal Columbia, Md., firm known as OmniTI. Surge has attracted some of the biggest names in Web operations and Internet scale and performance, including representatives from Google, Yahoo, Heroku, Opscode, 10Gen, VoltDB and Joyent.
Fried said to succeed in the scalable enterprise, engineers need to “understand the pathologies of failure.” And his story of failure involved a trading application built on Internet infrastructure but presented as a common desktop app to users-the traders. Fried’s team scaled the system to support external traders and took shortcuts.
“By hijacking APIs developers already used inside the company, we just made it work easier for the desktop environment,” he said. “It was arrogant of us to think that through smart software, we could hide from developers and end users” that they were operating on a flawed system. “As I look at APIs and frameworks to build apps, there is a tendency to make things that are hard seem not so hard … and that doesn’t always work.”
Indeed, “We had to scale up the organization to deal with our own success, and without even thinking, the way we scaled up was through specialization,” Fried said. “We never said understanding how everything works is important.” However, that now “forms the approach we use at Google for operations.”
Fried said because there were so many specialists doing their small part of the process to build out the application, very few people knew what other groups were doing. In fact, only two people knew what the app did top to bottom-Fried and an assistant.
So, after receiving what he referred to as “the call”-where he was instructed to go to the trading floor and watch as the app that had made him a star nearly made him a pariah-Fried got a taste for why he needed more generalists on his team. “I watched as a monitoring system for an app I designed moved from millions of dollars to zero in seconds.”
To break down the problem, Fried said he quickly found himself in a large room briefing an extended team on how the application worked so everybody could understand it. They eventually got to the root of the problem, which included problems with a load balancer and other faults.
However, specialization hurt in that case. “We had to rethink operations; operations is engineering,” Fried said. “We can’t allow technical barriers put up by the industry to separate us. … We need to reward and recognize generalist skill and reward end-to-end ownership.”
Moreover, Fried said Google gets this right. “We go to great lengths to hire people with engineering skills,” he said. “We put really great engineers in these operational roles, and we make sure at the end of the day somebody is accountable.”
In addition, Fried said of generalists, “You need people who can work at a high level but can go all the way down to the applications.”
Responding to a question of whether generalists are made or taught or if they are born, Fried said he believes they are born, because “it starts with an attitude,” which features “a dedication to self-improvement. People who have this attitude, they don’t want to stop; they want to keep digesting and learning.”
For its part, Google has an internal program, or “university,” to cultivate these generalist types, Fried said. “It’s about people who resent boundaries and not knowing things,” he added.
Surge is the brainchild of OmniTI CEO and engineer Theo Schlossnagle and his team. OmniTI is a global IT services company with more than 10 years of success in Web design, Web applications development and managed services. Schlossnagle attended Johns Hopkins University in Baltimore and stuck around to found his company in nearby Columbia.
In opening remarks for the conference, Schlossnagle spoke on the need to develop a DevOps culture that maintains a focus on engineering. “We’re about engineering, all about engineering,” he said.
As an OmniTI blurb in the Surge program put it:
“Like many of the success stories at Surge, we acquired experience through trial and error, constant collaboration between development and operations teams, and an unwavering commitment to excellence. But we still lean on our friends and peers to see how things can be done better.”