Google, UAL Gaffe Underscores Need for Smarter Web Crawlers

By Clint Boulton  |  Posted 2008-09-11

Google, UAL Gaffe Underscores Need for Smarter Web Crawlers

A 2002 Chicago Tribune story on UAL Corp.'s bankruptcy filing popped up on Google News on Monday, sending shares in the parent company of United Airlines to $3 from $12.50 on the Nasdaq. The stock rallied Tuesday (thank heaven) and now sits at $10.50.

The Tribune outlined the happening in a press release. Google outlined it in a blog post here.

Google said its search bot, which crawls pages online and catalogs content, Sept. 6 discovered a new link on the Web site of Tribune's South Florida Sun-Sentinel newspaper in a section called "Popular Stories: Business."

The link did NOT, and this is key, include a dateline, but the Sun-Sentinel page had a fresh date above the article on the top of the page of "September 7, 2008" (Eastern).  Google added in its post:

Because the Sun-Sentinel included a link to the story in its "Popular Stories" section, and provided a date on the article page of September 7, 2008, the Google News algorithm indexed it as a new story. We removed this story as soon as we were notified that it was posted in error.

The article then became available through Google News service, which passed it along to people who had created a custom Google News alert about United Airlines. By Monday, Sept. 8, the UAL story began circulating via a post by research firm Income Securities Advisors that was made available to users of financial news service Bloomberg LP.

Boom! UAL shares plummeted. Now, here's the trick. If human stock traders were the cause of UAL's stock drop, I would blame them. Even without the appearance of a date, traders covering UAL would know (we hope) that the story was old. The sell-off would be avoided.

What can we expect from Google in the next 10 years? Find out here. 

Some say the issue could have been avoided if the Sun-Sentinel had provided a publication date for the original Tribune article, enabling the Google news crawler  to reject the story as irrelevant.

Perhaps, but that's not implicitly true. As a Google Watcher and a Google Alerts subscriber, I occasionally see articles on Google Alerts that were published months ago. But because I cover Google extensively (some would say exhaustively), I have enough historical knowledge to discern old from new, even without a date on the article.

Unfortunately, human traders weren't the cause of the UAL issue. Instead, some search crawlers trolling the Web for news based on headlines and financial data executed stock trades on the fly. Unlike humans, who consider such metadata as the date and history of UAL, the machines apparently can't yet parse the deeper meaning behind the searches.

Analysts Weigh In on the Google UAL Gaffe

Google and the Tribune are finger pointing, with Google saying the Sun-Sentinel should have included a dateline on the Tribune on the story, while the Tribune blamed Google's bot, adding that it asked Google to stop crawling its network of newspaper Web sites months go. Google denies this.

"The claim that the Tribune Company asked Google to stop crawling its newspaper Web sites is untrue," a Google spokesperson told me.

I turned to Search Engine Land's Danny Sullivan for clarity. He told me:

Things like this have happened, as I've seen personally, but not in such a big fashion. Better verification of dates, better working between the news search engines and news sites would help, in particular perhaps more dependence on feeds. But also, people doing the basic amount of fact checking of a major story before feeding it into a major wire service would have helped. That's where 90 percent of the blame lies.

IDC's Susan Feldman had her own take on the matter, noting that the lack of a date for the article, which as Sullivan noted would have been added by humans, "tripped the whole train of events that eventually tripped up the automated trading programs." She said:

While wary humans should be able to spot this kind of mistake, computers can't unless they have been programmed to. And, to be honest, apparently a lot of humans weren't wary enough to spot the lack of a date either. So, human negligence kicked up the ranking of the article as the very human rumor mill kicked in. This has been happening ever since people started talking to each other. Think about the Teapot Dome scandal, or the War of the Worlds. The problem with any automated approach to processing information is that computers follow the rules that are set by humans. If the rule to check the date or kick out a document for human scrutiny is not in the rule base, then the computer processes a document as if it is current, and that triggers alerts, and trading problems.

Feldman says the solution is to improve the newspaper site, the crawlers and the automated trading programs, as well as having a better understanding of the unforeseen consequences of getting the wrong information at the wrong time to the wrong people.

So clearly what we need are smarter algorithms and therefore smarter crawlers, not only from Google, but from the automatic trading brokers. We can't continue to have these gaffes because it will seriously disrupt Wall Street, the aorta that pumps the lifeblood through the country.

Look at how UAL is suffering from an article published six years ago. Imagine if something similar happened to Google or Microsoft.

That would be even more embarrassing, but maybe that's what it would take to galvanize the companies into improving their search algorithms.


Rocket Fuel