Although Watson routed the humans in the first “Jeopardy” match, its Final Jeopardy question left everyone scratching their heads.
The Final Jeopardy category was “U.S. Cities.” Watson said, “What is Toronto?” with multiple question marks denoting its lack of confidence. The clue was: “Its largest airport is named for a World War II hero, its second largest for a World War II battle.” The human players, Ken Jennings and Brad Rutter, both got the right question (“What is Chicago?”), but Watson still finished with $35,734. Jennings had $4,800 and Rutter had $10,800.
Despite an otherwise impressive performance, Watson was soundly mocked on Twitter for the final mistake. “The machines don’t know all. Yet,” posted @erickohn.
The Double Jeopardy and Final Jeopardy rounds of the first game aired Feb. 15. The first round of “Jeopardy” had been broadcast on Monday, and the second game of the two-game tournament is scheduled for Wednesday.
Watson’s odd answer was a result of several confusing factors, according to David Ferrucci, whose post-game analysis appeared on IBM’s A Smarter Planet blog. “Jeopardy” category names are tricky because they “only weakly suggest” the expected answer, so Watson tends to downgrade the significance of the category name when calculating its answer, Ferrucci said. If the question had included “U.S. city” in the question, it would have given U.S. cities more weight in its search, he said.
Watson was also probably confused by the fact that there are several cities named Toronto in the United States, and the Canadian Toronto has a baseball team in the American League, according to Ferrucci. “Chicago” was the second answer on Watson’s possible list, according to A Smarter Planet.
Despite the mistake, Ferrucci was pleased with the outcome. With a confidence level of about 30 percent, it knew it didn’t know the answer and so bet “intelligently,” risking only $947.
“That’s smart,” Ferrucci said. “You’re in the middle of the contest. Hold onto your money. Why take a risk?”
Betting Algorithm in Full Force
Watson’s betting algorithm was in full force, as it found both Daily Double clues in the round. Watson wagered $6,436 and $1,246, respectively. “I won’t ask,” said the host, Alex Trebek.
Players often take into account other players’ scores, their confidence and their gut feeling when making wagers, which allows them to bet aggressively, according to Stephen Baker, the author of “Final Jeopardy,” a book about Watson. Watson’s calculations are strictly based on its confidence scores, he said.
It’s hard for a computer to calculate confidence, according to Nico Schlaefer, a student at Carnegie Mellon University who worked on the Watson project. “Humans usually know whether they know the answer. Watson may not,” he said.
Schlaefer worked on the algorithm that allowed Watson to gather relevant source material to find the answer and supporting evidence. Another CMU student on the project, Hideki Shima, worked on the algorithm for Watson to assign a score based on the likelihood of how well the supporting evidence supported each possible answer on its list of candidates.
When asked a question about items stolen from a museum in 2003, Watson had only 32 percent confidence in its first-choice answer. It said “I’m going to guess,” before giving the right answer.
IBM hopes to use the deep Q&A technology behind Watson to create systems that require lots of data analysis in a wide variety of fields, including legal, government and health care. “It’s limitless, the number of things you could apply this to,” IBM Research Program Manager David Shepler said during the broadcast.
In the legal field, lawyers could have access to a “vast, self-contained database” loaded with all of the internal and external information relating to litigation, protecting intellectual property, writing contracts or negotiating an acquisition, Robert Weber, IBM’s senior vice president of legal and regulatory affairs, wrote in the National Law Journal.
“Think about the possibilities for medical diagnosis support, for better anticipating the energy needs of utilities, or for protecting insurers, banks and governments from fraud,” Weber said.
Social services employees could use a Watson-like system to easily differentiate claims that come in each day, Anne Altman, a general manager in IBM Global Public sector, wrote in Government Technology. The system could separate out the claims for life-saving treatments as well as help caseworkers find similar cases from the past, she said.
Watson appeared to have breezed through Double Jeopardy, but that was apparently not the case. During the course of the game, Watson had crashed multiple times during the taping, said NOVA producer Michael Bicks, who had been at the taping of the show. The half hour match took four hours to tape, he said.
At the end of the game, the IBM team was still nervous about the outcome of the tournament because they knew “all the different ways it could lose,” Bicks said.
Watson beat the humans to buzz in and answer 24 of 30 clues. The computer nailed answers on an impressive variety of topics, ranging from architecture to biological science to classical music to “Saturday Night Live.”
If Watson wins the three-day-two-game tournament, IBM will donate the full $1 million prize to charity.