The E-Mail Petri Dish Grows

By Ben Charny  |  Posted 2006-01-20

The E-Mail Petri Dish Grows

In the name of science, Columbia University behavioral scientists recently dissected every e-mail sent by every Columbia student for a year.

During the year it took to procure the e-mail and the ensuing months of research on the 14 million or so missives, the 43,000 correspondents never knew they were the subjects of such a grand experiment.

Shockingly, the rather quiet way in which Columbia went about conducting its recently unveiled study is not an isolated case.

Professional and scholastic researchers said during recent interviews that they are digesting increasingly larger amounts of e-mails, and as continues to be the practice, the correspondents never have a clue whats going on. Researchers said they do not feel compelled to get their permission.

Despite a backlash, many researchers say they are actively trying to find ways to fill their research with even more missives, a sign of just how unstoppable the forces at work might be.

For instance, Bernardo Huberman, senior fellow and director of the Information Dynamics Lab at Hewlett-Packard Co., said hes trying to top a recent study that involved a few hundred HP employees mail with one using the e-mails from 50,000 workers.

All the limit-stretching has sparked significant recent debate in a number of areas, such as whether its feasible and even possible to require consent before studying someones e-mail, and what is the appropriate balance between fulfilling a research need and protecting the privacy rights of test subjects.

What comes from the discussions over the next few weeks and months promises to alter the future for both university researchers that rely on e-mail and the enterprises that seek to benefit from their work, either by following through with products or services based on the research or by providing the necessary equipment and services for the research.

University and private researchers assured eWEEK that to a person, they respect the privacy rights of the people involved in these studies.

"I think there are plenty of spooky privacy issues associated with online activity, and I agree that the trade-off between science and privacy is an issue well be thinking about more and more in the future," Duncan Watts, a researcher at the Institute for Social and Economic Research and Policy at Columbia, wrote in an e-mail.

"But I dont think the data in this study should be on your list of worries, compared with the kind of data that Google [Inc.] or Yahoo [Inc.] or AOL [America Online Inc.] or Amazon[.com Inc.], for example, routinely collect and analyze."

Read more here about e-mail privacy issues.

In some ways, its inevitable that peoples private communications are routinely reviewed to uncover even more personal details about them, and no one in charge feels the need to tell anybody about it.

Researchers now have at their disposal cutting-edge research techniques such as e-mail "electroscopy," which dissects a body of e-mail correspondence in newer and more revealing ways. The facile facet has a simple principal: The more e-mail it gets to probe, the more exact and useful the results.

To a person, researchers interviewed for this story agreed that the obvious way to appease those reacting with shock to the news that their mail is part of a petri dish for science or private research is to seek their permission before studying the messages.

But as studies grow in size and thereby effectiveness, getting consent from the subjects becomes much more difficult.

Besides, they explain, researchers are rarely after whats inside the e-mails. Rather, they study the time an e-mail was sent, how long the conversation lasted and other rather innocuous details, to fit against a larger context.

Still, theres a lot of emphasis on the precautions to make the mail anonymous before it gets to researchers.

Next Page: "Self-consistent but otherwise meaningless labels."


In all, Watts said it took an entire year of dealing with university officials and the Columbia Institutional Review Board to complete this process.

The messages that Columbia researchers ended up with barely resembled the originals.

All the names and other identifying information had been removed from the e-mails, along with the contents of the mail and their subject headers. What remained was the time stamp and dates when the messages were sent and received, and the dates of any replies and the names of the people who wrote them.

In place of the personal data was what Watts described as "self-consistent but otherwise meaningless labels" that allowed him to look for patterns.

"We can see that person A sent an e-mail to person B at time T, and that A and B were enrolled in class X at the time, but not who A and B were, or what the class was," Watts said.

This was the same experience for Jean-Pierre Eckmann, of the math and theoretical physics department at the University of Geneva. His study, "Entropy of Dialogues," involved about 20 million e-mails sent over a period of 83 days among 10,000 people.

"You can compare this to having access to the envelope of a letter, but not the letter itself," he wrote in an e-mail to eWEEK.

The IRBs administrative actions in the Columbia study and others at other universities are another important reason why e-mail studies of this magnitude dont require the consent of the e-mail writers.

The IRB concluded in the case of the Columbia study that "no individual in the data set could be identified." The determination in turn led to the research plans being approved under a category called "secondary data." Informed consent by subjects is not required when data is regarded as secondary.

While university researchers were relatively open about what they go through, the extent to which e-mail is studied at private corporations remains shrouded in mystery.

It appears that corporations such as Microsoft Corp. dont provide e-mail for study by outsiders, so the burden of supplying e-mails for research then falls on universities.

Typical of the corporate secrecy, a Microsoft spokesperson offered a relatively evasive response when asked whether the company studies e-mail patterns, how so, and if correspondents are told. "We dont have anything specific to share from an Information Worker perspective," the spokesperson said.

The security and privacy precautions do little to make a dent in the rather tepid outside support the practice seems to have.

In the wrong hands, privacy watchers like the Electronic Frontier Foundation contend that tens of millions of e-mails represent the doomsday scenario from a privacy perspective.

Also to consider is the overall effect on how safe people feel when using the Internet.

Some casual Web users expressed a palpable chill after learning that voyeurs, albeit acting for a good cause in a responsible manner, could be secretly studying all their personal information.

Read more here about this and other creepy, Net-induced feelings.

To some extent, the problems are an old one about trust and rising financial stakes. Its perhaps best summed up by noted technology commentator John Battelle:

"As we move our data to the servers at,,, and, we are making an implicit bargain, one that the public at large is either entirely content with, or, more likely, one that most have not taken much to heart.

"That bargain is this: We trust you to not do evil things with our information."

Check out eWEEK.coms for more on IM and other collaboration technologies.

Rocket Fuel