With 2015 at its end, Facebook took a look back at its year of using, developing and contributing to open-source software.
In a blog post, Christine Abernathy, developer advocate for the Facebook open-source team, said the open-source program at Facebook has grown, not only in terms of new projects, but also in the size and strength of its community. Abernathy credits the growth to contributions from more than 3,400 developers who contributed to the company’s projects—the majority of whom were external.
“Some of our most widely adopted projects saw additional uptake in 2015. WordPress and Netflix revamped their products with React,” Abernathy said. “Etsy migrated to HHVM this year, and Box announced that our virtual machine would be the exclusive engine serving its PHP codebase. Presto, our interactive querying engine, is used by companies like Airbnb, Dropbox, and Netflix, as well as by Gree, a Japanese social media game-development company, and Chinese e-commerce company JD.com.”
React—also known as React.js or ReactJS—is an open-source JavaScript library providing a view for data rendered as HTML. HHVM is an open-source virtual machine designed for executing programs written in Hack and PHP. HHVM was created as the successor of the HipHop for PHP execution engine, which is a PHP-to-C++ transpiler that has also been created by Facebook.
Last year, React and HHVM became Facebook’s first projects to generate 10,000 stars. Stars are a way for developers to keep track of repositories that they find interesting on GitHub. Abernathy said three additional projects joined the 10,000-stars ranks in 2015, while React tripled in popularity to become Facebook’s first project with more than 30,000 stars. Facebook’s open-source projects topping 10,000 stars include: React, with 33,000 stars; React Native, with 24,000 stars; Pop, with 13,500 stars; HHVM, with 13,000 stars; and Immutable.js, with more than 10,000 stars.
React Native, which Facebook announced in March of 2015, reached 24,000 stars in just nine months to become the company’s second most popular open-source project. As of this month, React Native has 4,000 forks and more than 4,000 commits from more than 400 contributors.
Abernathy said other notable newcomers include Relay, a JavaScript framework for building data-driven React applications, and GraphQL, a data query language. In all, Facebook had 125 new launches this year, increasing the number of projects in production by 50 percent over last year, she said.
“Now, with more than 330 total repos, we value contributions from our community more than ever to help us collaborate on solutions for common challenges,” said Abernathy in her post. “We had more than 2,500 external contributors this year, up from 1,000 in 2014. A special shout-out goes to Teradata—which joined the Presto community this year with a focus on enhancing enterprise features and providing support—for having seven of our top 10 external contributors.”
Meanwhile, Facebook will be “doubling down” on its commitment to supporting open-source projects in 2016, she noted.
However, Facebook was not the only born-on-the-Web company to increase its commitment to open-source over the last year. Major Internet entities such as Facebook, LinkedIn, Twitter and Google contribute vast amounts of code to the open-source community and lead the way in producing much of the infrastructure software for the big data and cloud era.
For instance, in 2015, LinkedIn made its biggest contributions yet to the open-source community by open sourcing more than 10 original projects, including Pinot, Burrow and Gobblin, and pushing significant updates to Samza, Rest.li, Kafka and Voldemort, four of the company’s most broadly adopted open source projects, said Igor Perisic, in a recent blog post.
Facebook, LinkedIn Reflect on 2015: The Year in Open Source
“We’ve worked to scale our infrastructure as we reached 400 million LinkedIn members, so it’s no surprise many of our open-source projects this year focus on building out our data pipelines and tools to help make sense of our data,” Perisic said. “The infrastructure improvements we’ve made in Kafka have allowed us to handle 1.3 trillion messages per day, and Espresso now serves 2.2 million rows per second.”
LinkedIn open-sourced Pinot in June. Pinot is LinkedIn’s real-time analytics infrastructure. Pinot enables the company to slice, dice and scan through massive amounts of data in real-time across a wide variety of products, said Kishore Gopalakrishna, a senior software engineer on the data infrastructure team at LinkedIn in a blog post.
“At LinkedIn, we have a large deployment of Pinot storing hundreds of billions of records and ingesting over a billion records every day,” Gopalakrishna said. “Pinot serves as the backend for more than 25 analytics products for our customers and members. This includes products such as Who Viewed My Profile, Who Viewed My Posts and the analytics we offer on job postings and ads to help our customers be as effective as possible and get a better return on their investment. In addition, more than 30 internal products are powered by Pinot…”
In October, LinkedIn open sourced PalDB. PalDB is a lightweight companion for storing side data. LinkedIn developed PalDB to assist some of its machine learning efforts and storage needs. At LinkedIn, an issue that often comes up is what to do to improve the usability and memory efficiency of side data, said Matthieu Monsch, an engineer at LinkedIn in a blog post. Side data is the extra read-only data needed by a process to do its job, he said.
For instance, a list of stop words used by a natural language processing algorithm is side data, Monsch said. Machine learning models used in machine translation, content classification or spam detection are also side data. When this side data becomes large, it can create a bottleneck for applications that depend on them. PalDB does more with less by providing a new read-only embeddable database that makes it much easier to scale side data.
Explaining LinkedIn’s open-source philosophy, Perisic said he believes participating in open-source projects makes engineers better because their work is exposed to the entire community.
“It seems paradoxical to think that developers write better software for others than they do for themselves, but it actually makes sense,” Perisic said. “When software is written ‘internally,’ developers have a tendency to cut some corners—and I’m as guilty as anyone—especially around documenting, making code easily readable and reusable and having all the right tests in order. The Open Source community has choices and will simply lose patience in trying to figure what your code does if it is too obscure. ‘Internally’ you may not have a choice.
“With open source, developers’ names are attached to the software they create and the entire community can look at it. This puts a human face on code and reputations on the line. Once a developer open sources some software, their names will be forever associated with it. Their design choices and bugs will be visible to all. This is a huge incentive to cross their T’s and dot their I’s. A developer wants to be associated with good stuff that is well written.”