For months, marketing and online-services companies have dreaded the coming of the General Data Protection Regulation (GDPR), pro-privacy rules protecting European citizens that went into force on May 25. Yet, few understood the impact that the rules would have on another group: security researchers.
Worried about falling afoul of the regulations, a number of domain-name registrars have limited access to the formerly-public database listing the contact information for the owners and technical contacts of domains. The Whois database maintained by those registrars is a useful tool for security researchers to use as an initial step toward tracking down malicious actors.
Similar services have shut down as well. A blockchain startup that included information about whether a wallet owner had passed a background check shuttered its service. And academic and industry researchers worry that their databases used to track down bad actors could expose them to legal liability.
In fact, the GDPR has garnered polar opposite reactions from developers and security professionals, said Guy-Vincent Jourdan, an associate professor of electrical engineering and computer science at the University of Ottawa, who described the reactions at two conferences he recently attended.
"I was at a web conference, and there, people were uncorking champagne—everyone was celebrating about GDPR, because it was so great and they were excited about it," he said. "While at the security conference, everyone was crying and saying this was the end of the world."
The GDPR aims to curtail the unwanted use of data and give consumers more control of their own data. Companies that use Whois data for mass emails and spammers who use it for fraud schemes will violate the rules. The publishing of identifying data—which includes IP address and many blockchain implementations—along with sensitive information also violates the GDPR.
Security researchers have traditionally found uses for public data in ways that were not intended. If those methods reveal the subject's identity, the researchers could violate sections of the GDPR. It will take both time and due diligence for security researchers to determine whether their investigative methods are impacted by the regulations.
"It is important for us, as researchers, to think about the data we are gathering and collecting," said Richard Ford, chief scientist at security software firm Forcepoint. "There are still ways to do it right, but it is just a little bit harder."
While the GDPR is intended to protect European citizens, because researchers cannot always know whose data they are collecting, the rules will hamper research in general, experts said. Here are five areas of research that are, or could be, impacted by the EU's General Data Protection Regulation.
When companies and individuals register a domain, their information is placed in a public database known as the Whois database. Large domain name registrars, such as GoDaddy, maintain a server that provides that information to anyone who asks using a web form or a service known as the domain lookup service on port 43.
In May, with the GDPR looming, leading registrar GoDaddy removed details of all the 57 million domains registered on its service, only responding to so-called “port 43 queries” with the organization, state or province, and country. Queries made through its website can get access to the full Whois record, unless the address is in a country protected by the GDPR.
While the lack of registration information could pose problems for researchers, the information in the database is usually not that useful for identifying bad actors but can be used for detecting patterns in ownership, said Allan Liska, senior solutions architect at Recorded Future.
"Whois has been a very valuable tool for researchers, but [has been] diminishing in value over the past few years," he said. "Bad guys tend to use fake information, but they tend to reuse that fake information, so it can still make connections and be valuable."
Companies have published "anonymized" data for research purposes in the past, only to find that the data actually allows the identification of some of the people whose information was included in the data set. In 2006, for example, the research arm of internet service America Online released a data set that included the search data of 658,000 subscribers. Yet, a variety of sensitive data—such as "can you adopt after a suicide attempt" and queries on incest—as well as location data, and even Social Security numbers, appeared in the data set.
For security researchers working with network telemetry data or information harvested from PCs, the dangers of de-anonymization—and a GDPR violation—are real.
"Most types of telemetry are not impacted, but you have to be careful when you are gathering telemetry to make sure that you are anonymizing the data," said Forcepoint's Ford. "If it is data-centric telemetry, GDPR is most likely not an issue. But when you are doing human-centric research, with anomalies in people's behavior, those data sets become even more difficult to manage under GDPR."
Blockchain technologies that allow for information to be harvested from the ledger have already fallen afoul of the GDPR.
In late May, for example, blockchain services firm Parity shut down its Parity ICO Passport Service (PICOPS) a day before the GDPR went into force. The service allowed owners of wallets to pass an ID background check, confirming that they were not from a restricted set of countries or on a watch list. Because the wallet is seen as an identifier, the service had to comply with the GDPR.