Google Go Lets Developers Work More With Multicore, Parallel Computing

By Jeff Cogswell  |  Posted 2012-11-05

Google Go Lets Developers Work More With Multicore, Parallel Computing

In January 2010, I reviewed the new language called Google Go for eWEEK. In that review, I focused mainly on the syntax of the language. Since then, the language has changed and evolved, culminating (finally) this year with Release 1.0 on March 28 and release 1.0.3 Sept. 24.

And over the past few months there have been a few minor releases to fix a few problems, which means now is a good time to take it for another spin and see how it has grown up.

Go and Multicore Programming

One thing I took a hard look at this time was concurrency. Much of today's Web-based software needs to be able to handle high volumes of users. Concurrency is a must. To run some concurrency tests with Google Go, I used an Intel second-generation Core i7 processor with four cores, each with two threads, for eight virtual CPUs. A quick Google search for concurrency issues in Go shows some people complaining that their programs run slower. However, this was not the case when I tested Go for the second time.

Remember, parallel programming isn't easy, and it's not something where you just turn up the number of cores and let it rip. You have to carefully design your algorithms to work in parallel. And if they're not designed right, there's a good possibility you'll see little or no increase in performance or even a decrease in performance. The algorithms may end up running sequentially, and then add on the overhead of trying to split them up among cores, and the whole thing will take longer to run.

To handle concurrency, Go includes what Google calls “goroutines,” which are similar to the coroutines in other languages. But an important feature of goroutines is that they don't just provide yield and resume. You can also easily communicate between them using variables called channels.

You can certainly communicate between parallel routines in other languages, as well, but Go makes this easy to do so. However, this does add to the work you need to do in writing your code. Channels effectively work as shared variables, but synchronization is built in. And there lies possible trouble: If you have a single channel variable and, say, eight computing threads, each running on a virtual core, and you force threads to wait as one is using the channel, then you totally defeat the parallelism.

I was able to easily put together some code that demonstrated this problem. Therefore, it's important to understand how parallel algorithms work and how to make use of parallel concepts like reduction.

Nevertheless, using goroutines, you can effectively create a parallel loop. If you're familiar with the Cilk Plus extensions to C++, you'll note that this is similar to the Cilk_for loop. However, in Cilk Plus you simply write your loop like a normal C++ loop and the runtime decides how and when to perform the parallelism, and how to divide it among cores. In Go, there's not a parallel for loop built into the syntax, but you can easily spawn parallel threads for them like so:

for i := 0; i<30; i++ {

    go func(i2 int) {




However, they don't automatically run on multiple cores. But it's easy to add multicores with a single line of code:


This tells the Go runtime to use eight logical cores. When I tested this out, I put a tight loop inside the parallel function and, as expected, all eight cores of my machine spun. (During this, I'm using Microsoft Windows  and Task Manager, which shows the eight cores maxing out.) You can determine the number of cores with runtime.NumCPU():


Google Go Lets Developers Work More With Multicore, Parallel Computing

However, I should note that while this is different from runtimes such as Cilk Plus, the documentation for GOMAXPROCS does state, “This call will go away when the scheduler improves.” That tells me that Google has plans to make the scheduling more automatic so you don't have to manually specify the cores, making it more like Cilk Plus.

And indeed, by specifying cores, you can run into trouble. If you set GOMAXPROCS to the number of cores, such as eight, and then write a loop that schedules out, say, 60 goroutines to run in parallel, the scheduler starts exactly eight goroutines running, and subsequent calls to “go” to start a goroutine wait until a core becomes available. In other words, subsequent calls effectively become synchronous. Look at this code carefully:


for i := 0; i<8; i++ {

    go func(c2 int) {

        fmt.Println("starting", c2)

        x := 0

        for j := 0; j<1000000000; j++ {

            x += 1

            x -= 1





fmt.Println("Finished with for loop")

In this case, with eight virtual CPUs, the message “Finished with for loop” will appear immediately; the eight threads are spawned and the loop finishes, and the message prints while the threads continue to run. But change the loop to a nine, and it behaves differently. The ninth call to go pauses and waits for an available thread. Only when it becomes available does it run.

But wait. Things change if we modify the code a bit, and insert a call to Sleep inside the innermost loop.

Like most languages, Sleep yields the thread so other threads can run. And that gave me more clues to what's going on here. When I took the Sleep call back out, and instead changed my call to GOMAXPROCS(7), things work better. Why is that? Because the outer function needs a thread, as well. By allocating eight cores to our loops, the outer function can't continue to run asynchronously.

By lowering the number by one, things run much more smoothly and my outer function doesn't freeze up. Next, I made that change to the above code, and set the loop to 64. The scheduler launched 60 threads, and all calls to go remained asynchronous, letting the first seven run on the seven other cores; as each finished, the scheduler started the next thread waiting in line. The string “Finished with the loop” displayed almost immediately as expected, meaning my outer function continued to run.

The moral here? The scheduler could certainly stand for improvements, as Google seems to recognize. Be careful with it, and recognize the implications of playing around with GOMAXPROCS.

Functional Programming and Closures

Another topic I didn't touch on the last time around concerns closures. Closures are a powerful feature in dynamic languages such as JavaScript, but they can also be abused. In Google Go, if you have a function that spawns additional threads, and that function then ends before the threads do, its variables stick around until all the threads are finished. I tested this out and it worked fine.

The way I did it was by taking the code I just showed you and moving it out of the main and into its own function. I created a variable in that function, and let each loop-thread print out the value of that variable. It continued to work fine even after the function returned.

As for functional programming where functions are first-class objects, Go does indeed support that and always has. I also tested this out easily enough. But unlike a language like JavaScript, you have to declare user types for your functions, much like in languages such as C#. Rather than do a long write-up, I'll refer you to this blog on Google Gofrom a couple years ago that covers it nicely.

Final Analysis

I liked Go when I reviewed it the first time; I like it even better now. I'm seriously considering porting some of my Web-based apps to it and putting it on Google App Engine and seeing how they do. So far all indications are that they should perform very well. I'm going to make use of the multicore programming, but I'll be careful as I do, since it has some quirks. But the syntax and packages are modern enough that I shouldn't have to pull my hair out writing the code. Sounds like a win to me.

Rocket Fuel