Google Keeps Close Eye on Open Source

Q&A: Chris DiBona, a programs manager for Google, talks about how the company uses open-source software and what it contributes to the open-source community.

Chris DiBona, open-source programs manager at Google, gave a talk called "A Year of Open Source at Google" for the Google New York speaker series. Prior to his talk, which was closed to press, DiBona spoke on May 16 with eWEEK Senior Editor Darryl K. Taft about a series of issues such as Microsofts recent saber rattling over patents, Googles open-source development contributions and what GPLv3 means for Google.

What will you be talking about tonight?

Im giving a modified form of one of my regular talks… about how Google uses open source, how we keep an eye on it internally, as well as what we do externally in terms of things like the Summer of Code, [and] code release—we release a great amount of code into open source. So I go around and try to talk about that.

Can you talk about the open-source components that are used in software development at Google?

One of the things I talk about is the kinds of projects that we patch, that we use internally. Those include things like the Linux kernel, the GNU compiler collection, Python, Wine, Derby, Aspell, DSpace, Autoconf, MySQL, all kinds of great things like that.

What about in terms of open-source software thats used in deployment or production at Google?

We use the Linux kernel … every time you use Google, youre using a Linux machine. And then we have some fairly common open-source tools that we run on top of those, and then on top of those we run our proprietary software for serving Google, Gmail and all the different services.

What are the common tools you mentioned?

Things like the [GNU] binutils, like OpenSSL, OpenSSH, some network monitoring stuff… Basically things you would consider operating-system-level tools.

Are you involved with the Google Code (project hosting) project?

Yes, thats one of the Web sites that we master in our group.

How has that been going? How do you measure whats been going on there?

Well, there are a couple of facets of Google Code that are very important to us. One thing is we host a number of open-source projects that have nothing to do with Google on them. So in doing that weve become the No. 2 Web host for that kind of thing, after SourceForge. So thats been really fantastic.

Another thing that we do there is we turn on software and we have a bunch of documentation there about our APIs. Its sort of a way for coders and developers out in the world to learn more about Google technically, and how their programs can interact with Google technically. And its been very successful at that. Were very happy with how thats gone.

From your perspective, what do you think has been the impact of the Google Summer of Code project?

Well, there have been a couple of things that are pretty important that have come out of the SoC. The first is that to date weve engaged somewhere around 2,000 developers now. This year it is 1,000 developers, last year it was 600 and the year before that it was 400. So there are 2,000 developers that weve introduced to open-source software development.

And on the other side of things, with open-source projects and having the opportunity to take on students in this manner, theyve become very good at taking on new developers. So if you look at projects today, as opposed to when you looked at them three years ago, many of them now have processes and practices and ideas and ways of welcoming in new and inexperienced developers. And I think thats a very powerful thing and a very good thing for open-source software. So from both of those perspectives the projects been very successful.

That was part of what I was trying to get at—helping to bring more people who are interested in computer science into the world of open-source software…

Yeah, if you think about it, theres a lot of great software out there, but its sort of a tough jump for somebody whos young to switch from being a user of open source to being a developer of it. Because suddenly their code is out there for everybody to see, and they have to be able to interact with a lot of people who are pretty far along in their careers and people they often admire or are intimidated by. And so this is a nice way of making that happen, in my mind.

Do you have any data on how much open-source software Google gives back to the community?

Well, weve given over a million lines of code into open source. Thats one way to measure it. And thats a good number; its impressive, right? But I think more importantly, if you look at every major open-source software project out there—and a lot of minor ones—youll find Googlers either patching or releasing new features or releasing code for them or right into those projects.

A good example of that is just recently we released a bunch of tools for enabling folks to use MySQL better, with replication and such. So that was really fun to do. And weve released all kinds of things, like incredibly minor changes to incredibly major things… Like the Google Web Toolkit, which is completely open source as well. So we think its a really good way of sharing our level of innovation as a company with the rest of the world.

Are there any Google technologies that are currently in the pipeline to be open-sourced?

Well, as you know, we do not discuss things we have not released yet. And the reason we dont do that, by the way, is we like to make sure that when we launch something its ready to be launched. So were pushing pretty hard for some interesting things for Google Developer Day [May 31].

What impact will the GPLv3 have at Google?

Well, if you had asked me this nine months ago I would have said that it would mean that some GPL 3 programs, we wouldnt be able to adopt them because of the ASP [application service provider] provision in the original version. And I would have said at that time, and I still mean it, that thats not the end of the world, you know. We dont have to use every piece of open-source software out there.

/zimages/1/28571.gifClick here to read more about Googles "Summer of Code" project.

But the most recent draft of GPL Version 3 has actually dropped that provision, so it makes it very easy for us to say its likely that well welcome GPL Version 3 software into the company—even for things that may end up in production. Whereas before, if people opt to have that kind of restriction on [open-source software], we just couldnt use it in production and expose it to the end user.

It was sort of a thing that was like whatever they work on is fine with us, because were very good at managing incoming code into the company. So it was never really a problem. The latest revision [of GPLv3] is actually pretty good.

Do you have any thoughts on Microsofts recent claim that free and open-source software violates a large number of Microsoft patents?

Yeah, we saw that, and like most of the world wed like to see them actually enumerate what [those patents] are. Its more of a wait and see. Its easy to say things like that, its another thing to se what concrete actions come of it.

But if there is real meat to it then places like Google would have to be concerned, Id say…

You know, like I said, I dont know. Theres just not enough information for us to know right now.

Does Suns open-sourcing of Java have an impact on the way Google views Java as a development platform?

It doesnt change how were looking at it, but it does increase the utility of Java for us. So before they had released Java as GPL, we had signed a source code agreement with them where we could give them patches and bugs and all this other stuff—because we have a lot of fairly advanced Java development going on at the company. We have folks like Joshua Bloch working for us and hes a very prominent Java developer and hes involved in the Java Community Process very heavily.

So we always had a way of getting patches in and some features developed. So that was fine for us. But with it being open source, its actually better for us in a lot of ways, because we can access certain parts of the code in ways we couldnt before. And we can fix them and offer those fixes up without as much ceremony around submitting those patches and features. We can say, OK, its an open-source project so we can just release this stuff. Thats incredibly freeing for us. So we were very happy to see them go GPL there.

Do you, or have you done something like a Black Duck or Palamida assessment of your code?

No, and the reason why is we practice extremely tight control on how code comes into the company. And were very, very good at training our engineers. So, to give you an idea, I can look at any end binary in the company and I can tell you what open-source software is expressed within that—because of the way that we manage our code base.

So while those kinds of tools are interesting during an acquisition process—and we generally do not talk about our practices around acquisitions—theyre not as interesting to us internally. Also, I think that expanding the utility of that code would be useful. Right now Im not sure how incredibly useful that would be for us to run internally. They are good, quality projects, though.

Well, since you said you have a bunch of proprietary code running on top of a stack that consists of lots of open-source components, I was wondering how you could discern what was in there.

Its worth pointing out that its much like if youre running an application on top of Linux. Its the same way we sort of run our Web servers, our Web applications. And then we have Linux as a kernel and as an operating system underneath it.

The way we actually bring code into the company when were using an open-source library is extremely controlled. And the thing is, internally, Google as a company has always had a lot of discipline about how we bring code into the company.

Specifically, when you create a piece of code and you submit it, another Googler has to do a code review of your code before it ever gets into the code repository. And if somebody suddenly showed up and submitted 25,000 lines of code, well that would be questionable. And we have ways of dealing with that that are really very efficient. We tell people you want that to be inside this one directory, you want to tag it in a very specific way so that we can track it… So were actually quite facile at managing incoming code.

/zimages/1/28571.gifCheck out eWEEK.coms for the latest news, views and analysis on servers, switches and networking protocols for the enterprise and small businesses.