In June, Microsoft announced its intent to acquire LinkedIn, the business-oriented social networking service, for $26 billion. LinkedIn, which boasts a stable of talented engineers across a spectrum of disciplines—including cloud infrastructure, mobile and web technologies, has had a history of open-sourcing some of its core technology to the benefit of the community. LinkedIn’s Kafka message broker, for instance, helped launch a successful software company and spawned related projects. However, as the company becomes a part of Microsoft, some folks have wondered whether LinkedIn’s focus on open-sourcing technology for developers would continue unabated. In this eWEEK Q&A, Igor Perisic, vice president of engineering at LinkedIn, explains that the company will not only continue to open-source technology, but will try to teach Microsoft a lesson or two about working with the community.
Why is open source important to LinkedIn?
The rationale and how it became important to our culture at LinkedIn is that we have built solutions that leveraged open-source technology very early on. The whole field of engineering and the way that we think about it is that it comes down to craftsmanship. And if you believe in craftsmanship, that means that you are periodically and constantly updating and upgrading your know-how. That should mean you’re also looking at what is happening in the open-source community in a lot of domains.
Within that perspective, contributing to the open-source community is just being part of the community of engineers. For us, what we get in terms of benefit is that we continue to grow the talent of our engineers—because I believe that fundamentally engineers are made better by contributing to open-source projects than just doing things internally.
Will your open-source philosophy change at all when you become part of Microsoft?
Before the close, I am actually not aware of one thing or another. But I will express my opinion. I don’t see it changing at all. The same mechanisms would apply. Now you could imagine a situation where if perhaps what we would like to open-source is a functionality or clone of what Microsoft is selling, then there may be a discussion. But I don’t see us ever doing that in the sense that we’re not going to build a new OS, we’re not going to invent another language and we’re not going to try to replicate any of the big enterprise software that Microsoft has. It’s not what we do anyway. The code paths are different anyway.
I actually think this is a place where we can help Microsoft because their attention to open-source was a little bit later than ours and we can share some of the processes that we have around open source to encourage individuals to contribute.
Can you explain that process or those processes?
Fundamentally, you should be free to open-source anything. Having said that, in order to open-source anything, be a good citizen. By being a good citizen, there are a couple of things that you need to do. One is produce good code—don’t make it a bunch of to-dos and commented things; make sure it compiles and it has the tests and is well-documented, etc.
Two, make sure that you’ve done your due diligence and there is not another open-source project that does exactly what you want to do. Or an open-source project that has similar functionality or intent and therefore your project should be contributed into the other, older open-source project instead of just creating another one. One of the problems we have in the open-source community is we have so many projects nowadays, and it’s only going to grow; it’s not going to decrease. You may end up in a situation where you don’t necessarily know if something is in active development, stable, alpha, beta, deprecated, etc. It’s hard to navigate. So if you create a new one, just tell us what to compare it to. What does your thing do?
Then you check for known security vulnerabilities. Once you’ve done all of that, make sure that the license is the right one—that you don’t have some type of “copyleft” license for example. And make sure that what you are open-sourcing is not something that your company, in this case LinkedIn, considers a competitive advantage.
LinkedIn: Open-Sourcing Under the Microsoft Regime
It took us a little bit of time to get to the point where we could make the process work efficiently. Internally, nobody is paid specifically to review the open-source stuff that we do, except maybe somebody in legal to make sure the licenses that we look at are proper.
What happens once you’ve open-sourced the projects? How much input do you continue to have?
It’s kind of a trick in the open-source community because it’s rather hard. You do the outreach and you push your project, but your project can take off very, very quickly and become a big thing. Then it becomes tricky to navigate how you continue to develop internally on it as well as watch where the open-source community is taking it. They may take it in another direction that’s not necessarily the one that you would rather have. But the project has a right to exist on its own. There’s some stuff, for instance, in Kafka that has a goal that doesn’t map one-to-one to what we want from it, which is perfectly fine.
This occurs for a few significantly large projects, but very few of them reach that level. What we created recently—and we have done it very well—is the opposite, which is projects that were very involved but some years pass and you become less and less involved. Or it can be in the attic and no development is really occurring anymore. And you can look on GitHub and see that nobody has contributed anything in the last year and a half to two years. At that point, how do you mark such a project as having no active development anymore that’s visible for people who may want to use it?
How do you decide which technologies you want to open-source?
Going back to what I said before, you open-source anything that you believe has a right to exist on its own and is not a replication of another open-source project.
So, at what point do you know it’s ready to go out to the community?
We do a code check, we look at whether it’s secure, and we do a code review with all the test cases and make sure you have good documentation that goes with it.
You mentioned Kafka, what are some of the other better-known projects that have come out of LinkedIn?
The first one was Voldemort, which is a distributed key value store. Then more recently, Samza is picking up. Samza is a stream-processing framework. Then there is Helix, which is a cluster manager. Rest.li is a way that our services communicate with each other. Dr. Elephant is getting good reviews. Dr. Elephant for us is a way to look at Hadoop code and look at inefficiency within it. We released a machine learning framework, PhotonML, which is getting good reviews. Those are the ones that pop onto the top of my head.
How do you feel when you see some of these projects go out into the community and flourish?
I feel super-proud; I feel like I’ve achieved something. It’s an accomplishment. I feel proud of having set up an environment where that developer has a feeling that they can create such a thing and it gets picked up by the community and fosters innovation around it. I’m more than excited seeing how Kafka has transformed the industry and created companies around it. Confluent is a company that was built around Kafka.
Creating business opportunities around open-source projects that others pick up is one of these things you should feel proud about. Yahoo should feel super-proud of having built up Hadoop. And Google should be proud of having written that paper [Google File System paper] that spawned the creation of Hadoop. I think they missed something by not open-sourcing that stuff, which is too bad. However, Google did open-source TensorFlow, the machine learning library.