Harry Shum, Ph.D., the managing director of Microsoft Research Asia, is on a mission.
His primary mission is to lead Microsoft Corp.s research efforts in China from his office in the heart of Beijings top academic, science and technology corridor. His immediate goal is to help Microsoft catch and beat Google in the search technology space.
A committed fan of the Pittsburgh Steelers and their smash-mouth football style of play, Shum is not one to shy away from competition, even when it involves friends and former colleagues. Kai Fu Lee, who formerly held Shums position at Microsoft, has moved over to Google to lead that companys R&D efforts in China.
Shum met in his Beijing office with eWEEK senior editor Darryl K. Taft for a candid exchange.
What are some of the key areas you guys focus on here?
We focus on five major areas. When we started we knew we wanted to work on the human-computer interface. When we started, it was evident that for Microsoft, we really have to think about that for all of our customers, not only for those users in North America or who always use Roman characters.
For people in China, Korea and Japan, its very important that we design the computer-user interface so that we can use it very easily like Chinese speech and like handwriting. So we have done a lot of component technology for Chinese and Japanese language, something called the language model came out of our lab. A lot of handwriting technology for the Tablet PC came out of our lab.
Another area is digital media. You probably have a lot of photos and videos, etc., on your computer, so being able to better store and process them is important. We are working on technology for that.
The third area we started here is digital entertainment, because computers are not just for processing, theyre being used more for entertainment. So we are working on projects for that.
The fourth area is system and networking. Computers nowadays are all connected, so system issues, networking issues and wireless issues all have to be addressed. And weve done a lot of work in this area.
The fifth area we started a year and a half ago is Web search and data mining, because now this is such an important area—not only for Microsoft, but for the entire industry. We have a group of very talented people here working on that.
You mentioned search and data mining as a key area of focus here. How much more is there to be done in this area? Maybe my view is naïve, but whenever I go to search for something I seem to have pretty good success finding it. So what more do we need?
I wish it could be as simple as that. But the truth is that the search thing is still very, very difficult. And its difficult because users are very, very different. People have different requirements for search. You could just search for a restaurant and then you get that and be happy. Or people could need to do a search to find information for a term paper. This is the so called recovery and discovery part. Recovery is where you say youve seen something before, just recover it for me.
When you get to discovery it gets very complicated and I dont think we really understand that yet. Typically, when people do search today its one or two words about 95 percent of the time. But if you have something reasonably complicated or reasonably long, were still not there yet.
So there is something called search relevance that Google has been ahead of most competitors in. But the gap is closing. And Yahoo claims statistically that this difference does not even exist anymore between Google and Yahoo. MSN, with a lot of help from MSR is closing the gap like crazy. We will be catching up with them in a matter of months. And something will be there. But that is only one problem—one very tough problem, however.
There are many other things. Its really about once you get this search thing. First of all, you get this search thing right, that will continue to be a big research issue … even involving system design, architecture, how quickly you can search, for instance.
So its not just performance youre looking at?
Its not just performance. Performance is certainly an issue. Google now is building this 500-computer cluster, and Microsoft is looking at a lot of things in content delivery. Those are great things that have to be done.
But its not just the algorithm part of that. Its really a lot of things there. In the end, my view is its really about the user experience. The way I look at it is I approach search and many other topics as "have we delivered what the user really wants?"
You mean like personalization?
Like personalization, like mobilization. I think we just dont totally understand it all yet. And the funny thing is that even though search probably only gives you about 30 percent of the correct answers, the users are already pretty happy today. Likewise with speech recognition, you have 95 percent correctness and users say this is a piece of … So you see its a very different thing.
And we as researchers see that even though the technology is flawed it can be useful. But then two years later when people get used to the technology, people will say this Google thing doesnt really give me anything yet. Can search engines please give me something more usable? Peoples expectations will keep going up.
So I will say this is going to be a long battle. And Im not happy Google is ahead, but thats OK, we have something to do there as well. So two years later I think things might change. There are a lot of smart people at Google, and a lot of smart people at Microsoft, more smart people outside both. So you will see a lot of innovation going on. And I would say this search thing is really just the beginning.