Which Is Scarier: Dirty Data in the Hands of the FBI or Bloggers?

By Lisa Vaas  |  Posted 2005-08-29

Which Is Scarier: Dirty Data in the Hands of the FBI or Bloggers?

Just what we need, in this era of identity theft and unclean data: another reason to fear and loathe a database.

This time, its the FBI Criminal Database, which, it turns out, spits out missed records and false positives at the rate of some 11.7 percent.

The numbers come from a review of the National Crime Information Center and the Interstate Identification System recently commissioned by the National Association of Professional Background Screeners in order to evaluate how accurate and complete criminal records are.

Not particularly, as it turns out.

The author of the review, Craig N. Winston, primarily looked at a fairly recent report—2001—by the Bureau of Justice Statistics that found that the "accuracy and completeness of criminal history records is the single most serious deficiency affecting the Nations criminal history record information systems."

The BJS report analyzed 93,274 background checks from Florida licensing or employment applicants, 323 public housing applicants, and 2,550 volunteers. Out of that group, when compared to fingerprint-verified criminal histories, name checks turned up 11.7 percent false negatives and 5.5 percent false positives.

That means that out of 10,673 subjects found to have a criminal record that was verified by fingerprints, name checks missed 1,252 of them, returning a false clean slate. Of the 82,610 individuals determined as not having criminal records, it missed 4,562 who in fact were criminals.

Based on those findings, the BSJ found that out of the 6.9 million fingerprints conducted by the FBI in 1997, 346,000 false positives and 70,200 false negatives would result if a name-checking database were used.

Granted, thats old data. Winston, an assistant professor of criminal justice at Sonoma State University, told me that things have been improving since then and, he hopes, will continue to improve.

Still, core problems remain.

One major problem is linking data to the proper individual and case. Due to the use of aliases, false identifying information and clerical errors, duplicate records are created. Such problems can be overcome with the use of fingerprinting, but, as Winston pointed out, Burger King isnt going to start fingerprinting potential employees any time soon.

Some states have mitigated the problem by implementing a case tracking system that integrates individuals names with their case identification numbers. However, states still report problems with linking names with numbers, particularly given modifications to records, such as plea bargaining.

Next Page: Inconsistent format between states.

Page 2

Another problem is inconsistent format between states. As in a situation with merging companies, disparate formats churn out records with blank data fields or fields that are marked with the dreaded label "unknown." (I know that label well—it popped up in my .CVS database of exported Yahoo contacts, and its rendered moot my migration to Googles GMail and its tempting new Google Talk, because I aint going nowhere without my 1,798 contacts, thank you very much. And yes, I know Skype is better than Google Talk, but cmon, were talking Google here—I want to get my system Googlized and I really, really want to be able to search my e-mail content!)

Time lag between transmission of data in a criminal case is another serious problem. A 2005 Bureau of Justice report found that the average number of days for repositories to receive and process criminal information was 24 when it came to arrests, 31 when it came to prison admission, and a whopping 46 when it came to court disposition.

Thats a problem when it comes to hiring new employees or evaluating rehabilitation efforts, as was pointed out by an administrator in a correctional facility in the Midwest.

Click here to read about data theft at MCI and its influence on the encryption debate.

Another problem is discrepancies between how states classify crimes. For example, selling marijuana is a felony in California and Texas, but selling up to 25 grams in New York or 20 grams in Ohio is a misdemeanor. How exactly do we classify, in a national database, whether someones a serious criminal, if we cant even agree between the states what a serious criminal is?

"Clearly that could make a significant difference to an employer who wont hire anyone related to any drug charges," Winston pointed out.

Whats the answer? Perhaps it lies in Oracles Data Hubs. Maybe we really do need one huge database instance. But it starts to get a little scary. It reminds me of something out of the Lord of the Rings.

"One Hub to rule them all,
One Hub to find them,
One Hub to bring them all
and in the darkness bind them."

One big Oracle data hub watching over us. Scary. I hope it relies on New Yorks and Ohios sentencing guidelines. But seriously, why should I trust Oracle with my data? I dont trust anybody with my data. Period. I dont trust motor vehicle registries, insurance firms, marketing companies or public records such as court documents and licenses. If I have the misguided inkling to start trusting them, I just go back to Baselinemag.coms story on the rising threat posed by bad data. Reading the articles tale of an innocent victim who was left to rot in jail thanks to an identity mix-up will convince you that there really is no reason to have faith in those who control our data.

If you want another reason to snip your broadband connection and crawl into a shack in the woods, look no further than ZabaSearchs Zafka-esque plans to hook blogging into their person-search site results. Thats right, not only is it easier than ever to collate every piece of personal information, no matter how obsolete or inaccurate, thats ever been electronically churned out, but now theyre going to let the blogosphere loose on you.

Click here to read and download Baselines seven-step plan for cleansing your data.

Now, no offense to the solid journalism being done by reputable news reporters in the blogosphere, but puh-leaze! Do we really want to let unmoderated yahoos pour forth their revenge, their unsubstantiated rumors, their bottomless pit of nonsense, on anybody whom they choose to dogpile?

Considering all the news, this was a dark day for databases. It just goes to show how perfectly good technology can be mishandled in intensively creative ways. Im going on vacation this week, and when I show my face again at Oracle Open World, I hope the database madness will have settled down. In the meantime, please e-mail and let me know whats on your mind heading into Open World. Also, you SQL Server DBAs out there, I want to know if you like the idea of programmers having direct access to the database without the need for Transact SQL. Its coming up in the tight integration of the stack with SQL Server 2005. Good thing, or over your dead bodies?

Lisa Vaas is Ziff Davis Internets news editor in charge of operations. She is also the editor of eWEEK.coms Database and Business Intelligence topic center. She has been with eWEEK and eWEEK.com since 1995, most recently covering enterprise applications; database technology; and RSS, syndication and blogging technologies. She can be reached at lisa_vaas@ziffdavis.com.

Check out eWEEK.coms for the latest database news, reviews and analysis.

Rocket Fuel