Google, UAL Gaffe Underscores Need for Smarter Web Crawlers

This is the story of how human and machine errors, along with Google and the Chicago Tribune, sent the stock of UAL (United Airlines' parent company) hurtling down Wall Street's chasm. Search Engine Land's Danny Sullivan and IDC's Susan Feldman weigh in on as Google and the Tribune seek to move past the unfortunate, untimely 6-year-old bankruptcy report.

A 2002 Chicago Tribune story on UAL Corp.'s bankruptcy filing popped up on Google News on Monday, sending shares in the parent company of United Airlines to $3 from $12.50 on the Nasdaq. The stock rallied Tuesday (thank heaven) and now sits at $10.50.
The Tribune outlined the happening in a press release. Google outlined it in a blog post here.
Google said its search bot, which crawls pages online and catalogs content, Sept. 6 discovered a new link on the Web site of Tribune's South Florida Sun-Sentinel newspaper in a section called "Popular Stories: Business."

The link did NOT, and this is key, include a dateline, but the Sun-Sentinel page had a fresh date above the article on the top of the page of "September 7, 2008" (Eastern). Google added in its post:

Because the Sun-Sentinel included a link to the story in its "Popular Stories" section, and provided a date on the article page of September 7, 2008, the Google News algorithm indexed it as a new story. We removed this story as soon as we were notified that it was posted in error.

The article then became available through Google News service, which passed it along to people who had created a custom Google News alert about United Airlines. By Monday, Sept. 8, the UAL story began circulating via a post by research firm Income Securities Advisors that was made available to users of financial news service Bloomberg LP.
Boom! UAL shares plummeted. Now, here's the trick. If human stock traders were the cause of UAL's stock drop, I would blame them. Even without the appearance of a date, traders covering UAL would know (we hope) that the story was old. The sell-off would be avoided.

What can we expect from Google in the next 10 years? Find out here.

Some say the issue could have been avoided if the Sun-Sentinel had provided a publication date for the original Tribune article, enabling the Google news crawler to reject the story as irrelevant.

Perhaps, but that's not implicitly true. As a Google Watcher and a Google Alerts subscriber, I occasionally see articles on Google Alerts that were published months ago. But because I cover Google extensively (some would say exhaustively), I have enough historical knowledge to discern old from new, even without a date on the article.
Unfortunately, human traders weren't the cause of the UAL issue. Instead, some search crawlers trolling the Web for news based on headlines and financial data executed stock trades on the fly. Unlike humans, who consider such metadata as the date and history of UAL, the machines apparently can't yet parse the deeper meaning behind the searches.