ID System Hurdles: Connectivity, Data Cleaning

If the U.S. Government were to seek to establish a national identity card, it would confront a rat's nest of technical problems, despite recent advances.

If the U.S. Government were to seek to establish a national identity card, it would confront a rats nest of technical problems, despite recent advances.

The big-three immediate technical issues are authentication, verification and encryption. The good news is that biometric-based authentication systems based on fingerprint scanning or visual face recognition are now relatively cheap and simple to deploy. In addition, digital signature systems based on public-key infrastructure technology are now mature, providing ways to check the source of information and to ensure data has not been modified since issued. Further, strong encryption technology (as typified by the new federal Advanced Encryption Standard released Dec. 4) provides adequate safeguards for transmission of sensitive data.

However, even with more resources available to system builders, it has proved very difficult to make large-scale, distributed database systems such as this work.

If officials want to use a centralized database for real-time checks on data, such as for passengers at an airport, the system has to be highly scalable. U.S. airlines carried 666 million passengers last year, according to the Air Transport Association of America Inc. Thats about 1.8 million passengers a day, or about 1,700 passengers a minute, assuming an 18-hour airport day.

However, performance itself isnt an issue. According to Winter Corp.s year 2000 large-database-systems survey, the U.S. Customs Service operates the worlds fastest online transaction processing database system, one that processes 26,655 transactions a second (or 1.6 million transactions a minute).

Availability is a more difficult technical issue, especially with a widely distributed system dependent on many point-to-point communication links. Communication failures happen; backup links or a fallback to a subset of data stored locally at airports would be needed.

To avoid this issue, the Immigration and Naturalization Services INSPass border-crossing program uses scanners that verify hand-print data against data stored securely on a smart card carried by frequent U.S. travelers instead of querying a central database.

However, even if the technology does its job perfectly and without security compromise, the system is only as strong as its weakest links—human operators, flawed administrative policies and data quality problems.

Oracle Corp. CEO Larry Ellison, who is publicly calling for a national ID card system (see story, Page 45), muted his comments in an Oct. 8 Wall Street Journal article to say the real problem was not really ID cards but a lack of database integration among federal agencies that store data about those who live in or visit the United States. "Do we need more databases? No, just the opposite. The biggest problem today is that we have too many," he wrote.

Ellisons comments point to an entirely separate problem from the issue of identifying users with an ID card system.

Certainly, consolidating large amounts of data into a single database system would make it easier for federal agencies to share information with one another (as well as generate considerable revenue for the vendors providing the database system). A master database (or any widely held source of trust) also magnifies the harm of unintentional errors and benefit to those who successfully submit fraudulent entries.

As many have discovered before, the main problem with integration isnt just loading the data into one system, its correctly correlating records that werent entered with consistent formats. Matching up name, address and birth date data is a very expensive process, given formatting and data entry differences. Because of the serious consequence of errors in the system, such as not being able to board a plane or being detained by police, the database must be very carefully cleaned before being put to use.

A 1991 study by mortgage reporting company Consolidated Information Service that analyzed 1,500 reports from three large credit-rating agencies that try to perform large-scale information amalgamation found errors in 43 percent of the files, according to Simson Garfinkel in his book "Database Nation." Many of these errors are certainly minor, but authorities using the database must make judgments of which data to ignore and which to take seriously.

Credit-reporting agencies that do such database merging have found the process problematic. Robert Ellis Smith, publisher of the newsletter Privacy Journal, in Providence, R.I., noted that a 1991 Arthur Andersen study found 25 percent of consumers disputed the accuracy of their credit records. "In 87 percent of those disputes, the credit bureau agreed to correct the report," Smith said.

Creating a national ID database from the Social Security number database would also be difficult. Its not a very secure piece of personal data, and it doesnt contain a check digit, so data entry errors are hard to detect.