LinkedIn: Open-Sourcing Under the Microsoft Regime
It took us a little bit of time to get to the point where we could make the process work efficiently. Internally, nobody is paid specifically to review the open-source stuff that we do, except maybe somebody in legal to make sure the licenses that we look at are proper. What happens once you've open-sourced the projects? How much input do you continue to have? It's kind of a trick in the open-source community because it's rather hard. You do the outreach and you push your project, but your project can take off very, very quickly and become a big thing. Then it becomes tricky to navigate how you continue to develop internally on it as well as watch where the open-source community is taking it. They may take it in another direction that's not necessarily the one that you would rather have. But the project has a right to exist on its own. There's some stuff, for instance, in Kafka that has a goal that doesn't map one-to-one to what we want from it, which is perfectly fine. This occurs for a few significantly large projects, but very few of them reach that level. What we created recently—and we have done it very well—is the opposite, which is projects that were very involved but some years pass and you become less and less involved. Or it can be in the attic and no development is really occurring anymore. And you can look on GitHub and see that nobody has contributed anything in the last year and a half to two years. At that point, how do you mark such a project as having no active development anymore that's visible for people who may want to use it?Going back to what I said before, you open-source anything that you believe has a right to exist on its own and is not a replication of another open-source project. So, at what point do you know it's ready to go out to the community? We do a code check, we look at whether it's secure, and we do a code review with all the test cases and make sure you have good documentation that goes with it. You mentioned Kafka, what are some of the other better-known projects that have come out of LinkedIn? The first one was Voldemort, which is a distributed key value store. Then more recently, Samza is picking up. Samza is a stream-processing framework. Then there is Helix, which is a cluster manager. Rest.li is a way that our services communicate with each other. Dr. Elephant is getting good reviews. Dr. Elephant for us is a way to look at Hadoop code and look at inefficiency within it. We released a machine learning framework, PhotonML, which is getting good reviews. Those are the ones that pop onto the top of my head. How do you feel when you see some of these projects go out into the community and flourish? I feel super-proud; I feel like I've achieved something. It's an accomplishment. I feel proud of having set up an environment where that developer has a feeling that they can create such a thing and it gets picked up by the community and fosters innovation around it. I'm more than excited seeing how Kafka has transformed the industry and created companies around it. Confluent is a company that was built around Kafka. Creating business opportunities around open-source projects that others pick up is one of these things you should feel proud about. Yahoo should feel super-proud of having built up Hadoop. And Google should be proud of having written that paper [Google File System paper] that spawned the creation of Hadoop. I think they missed something by not open-sourcing that stuff, which is too bad. However, Google did open-source TensorFlow, the machine learning library.
How do you decide which technologies you want to open-source?