In all, Watts said it took an entire year of dealing with university officials and the Columbia Institutional Review Board to complete this process. The messages that Columbia researchers ended up with barely resembled the originals.In place of the personal data was what Watts described as "self-consistent but otherwise meaningless labels" that allowed him to look for patterns. "We can see that person A sent an e-mail to person B at time T, and that A and B were enrolled in class X at the time, but not who A and B were, or what the class was," Watts said. This was the same experience for Jean-Pierre Eckmann, of the math and theoretical physics department at the University of Geneva. His study, "Entropy of Dialogues," involved about 20 million e-mails sent over a period of 83 days among 10,000 people. "You can compare this to having access to the envelope of a letter, but not the letter itself," he wrote in an e-mail to eWEEK. The IRBs administrative actions in the Columbia study and others at other universities are another important reason why e-mail studies of this magnitude dont require the consent of the e-mail writers. The IRB concluded in the case of the Columbia study that "no individual in the data set could be identified." The determination in turn led to the research plans being approved under a category called "secondary data." Informed consent by subjects is not required when data is regarded as secondary. While university researchers were relatively open about what they go through, the extent to which e-mail is studied at private corporations remains shrouded in mystery. It appears that corporations such as Microsoft Corp. dont provide e-mail for study by outsiders, so the burden of supplying e-mails for research then falls on universities. Typical of the corporate secrecy, a Microsoft spokesperson offered a relatively evasive response when asked whether the company studies e-mail patterns, how so, and if correspondents are told. "We dont have anything specific to share from an Information Worker perspective," the spokesperson said. The security and privacy precautions do little to make a dent in the rather tepid outside support the practice seems to have. In the wrong hands, privacy watchers like the Electronic Frontier Foundation contend that tens of millions of e-mails represent the doomsday scenario from a privacy perspective. Also to consider is the overall effect on how safe people feel when using the Internet. Some casual Web users expressed a palpable chill after learning that voyeurs, albeit acting for a good cause in a responsible manner, could be secretly studying all their personal information. Read more here about this and other creepy, Net-induced feelings. To some extent, the problems are an old one about trust and rising financial stakes. Its perhaps best summed up by noted technology commentator John Battelle: "As we move our data to the servers at Amazon.com, Hotmail.com, Yahoo.com, and Gmail.com, we are making an implicit bargain, one that the public at large is either entirely content with, or, more likely, one that most have not taken much to heart. "That bargain is this: We trust you to not do evil things with our information." Check out eWEEK.coms for more on IM and other collaboration technologies.
All the names and other identifying information had been removed from the e-mails, along with the contents of the mail and their subject headers. What remained was the time stamp and dates when the messages were sent and received, and the dates of any replies and the names of the people who wrote them.