Facebook takes some heat from Google and others in the social networking community for being a walled garden, keeping the information users share within its friendly confines close to the vest.
But most of the software infrastructure that supports Facebook activities is in fact open source, free for programmers to adopt, customize and use. Google, Twitter, Yahoo and other Internet companies today also support their platforms with open-source products.
Royal Pingdom took time in June to catalogue Facebook’s software infrastructure and noted that the social network is largely a LAMP (Linux, Apache, MySQL and PHP) Website, albeit with some modifications.
Facebook uses Sun Microsystems’ open-source database MySQL for persistent storage, Memcached for distributed memory caching between its Web servers and MySQL servers, and distributed storage software Cassandra for inbox search.
Facebook also created and released to open-source HipHop for PHP, a compiler that turns PHP into native code on Facebook’s Web servers.
“Not only is Facebook using (and contributing to) open-source software such as Linux, Memcached, MySQL, Hadoop, and many others, it has also made much of its internally developed software available as open source, Royal Pingdom noted, citing HipHop, Cassandra, Thrift and Scribe as well as the Tornado Web server.
Other Internet companies use many of the same open-source tools Facebook employs. For example, Twitter uses Cassandra on its geo and research teams.
Facebook and Twitter also use Hadoop, a map-reduce implementation that makes it possible to perform calculations on massive amounts of data. In fact, Hadoop seems to be used by every Internet company worth its salt, from Google to Amazon and Amazon Web services to Yahoo.
Yahoo is a heavy Hadoop user, burning through more than 100,000 CPUs in 36,000 computers for its Web search and ad platforms. The company contributed to the open-source project last month by releasing Hadoop with Kerberos security and a workflow engine for Hadoop.
Why, after years of enterprises powered by proprietary software infrastructure from Microsoft, Oracle, HP, IBM and others, are big Internet companies writing or adopting open source with such gusto?
Open Source Rules the Roost
Redmonk Analyst Stephen O’Grady told eWEEK open source often wins out over proprietary products these days because there is little benefit relative to the returns for developing non-differentiating software in-house.
“While this open source, radically distinct infrastructure software would seem to pose a threat to the incumbent software vendors, that’s relatively far out. Most adoption of the specialized, Web-scale technologies is greenfield, if for no other reason than the fact that few if any packaged applications are capable of running on, say, non-relational datastores such as Cassandra.
“In the interim, we’re likely to see traditional providers uneasily coexist with the up and coming open-source alternatives that have sprung from Web native companies such as Facebook and Twitter.”
Microsoft’s enterprise software business president Bob Muglia, whose company is under siege by the burgeoning open-source movement, offered his own explanation of this to the New York Times:
“We did not get access to kids as they were going through college,” Muglia said. When people wanted to build a start-up, “and they were generally under-capitalized, the idea of buying Microsoft software was a really problematic idea for them.”
In short, today’s computer programmers go through college writing programs on software that wasn’t created by Microsoft, which missed the opportunity to gain bleeding-edge Web programmers.
The open-source approach is extending to the hardware segment to a degree, according to Pund-IT founding analyst Charles King:
“Many of the big search (Google/Yahoo) and social networking (FB, etc.) companies are leveraging x86 and Linux distros of one sort of another. The huge scale-out, cloud-style infrastructures they’re using — tens of thousands of barebones x86 boxes w/SW-based RAS features/functionality — are fundamentally/philosophically different from traditional Big Iron Systems.”
King said the first rank of losers includes traditional UNIX systems and enterprise software vendors, but there is a trickle-down affect into the services businesses of IBM and HP.
“In many cases their services organizations are also being shut out by Web-focused companies (Google’s reportedly a good example) who find that it’s cheaper to plan/design their own servers/storage & outsource fabrication to the same third-party, Asia-Pacific IT manufacturers used by many system vendors.”
This is a big reason why Dell, IBM, HP and others are aggressively moving to the cloud space. These vendors want to staunch any potential revenue hemorrhages from their traditional businesses and embrace the cloud computing model the Internet companies built with open-source software and commodity hardware.