The Great Firewall of China is no firewall after all.
The Peoples Republic of China has no firewall perched on its routers to enable censors to block Internet sites.
Rather, the authoritarian regime relies on a far more sophisticated censorship system that uses a keyword blacklist and routers that reach deep into Internet traffic to find forbidden words or phrases.
“Conventional wisdom was its a firewall—all around the border, youd be blocked. We found that sometimes [it takes a few hops within China to get blocked], up to 13 hops. Some paths werent filtered at all,” Jed Crandall, an assistant professor of computer science at University of New Mexicos School of Engineering, told eWEEK.
In fact, the “Great Firewall of China” that researchers believe is used by the government to block users from accessing what it considers objectionable content is in reality a “panopticon“—a type of prison that relies on prisoners not being able to tell whether or not theyre being observed.
The group of researchers, which includes some researchers from the University of California-Davis, have found that what theyre calling the Great Firewall of China doesnt have to block every illicit word out there—only enough so that users conduct self-censorship because they know their online movements are being watched.
Indeed, some 28 percent of Chinese hosts that the researchers sent probes to were reachable along paths that werent filtered at all, thus disproving the idea that GFC keyword filtering occurs on a firewall strictly at the border of Chinas connections to the Internet.
Firewall evasion takes on a more complex character, given that Chinese Internet users are tricked into thinking theyre constantly being blocked. The researchers thus are proposing an architecture to bypass GFC keyword filtering that doesnt even bother with firewall evasion.
Instead, theyre working on a tool, called ConceptDoppler, that opts for a surprising route: Namely, to spammify words on Chinas blacklist. First, they have to discover what those words are, and theyre doing so by modulating packets, finding out how many hops packets are using to reach China, and determining which specific routers are doing the blacklist filtering. Those routers are, in fact, sending resets as a way to block download of illicit content.
The researchers say ConceptDoppler will act as a kind of weather report on changes in Internet censorship in China and elsewhere. The tool uses algorithms to cluster words by meaning and to identify keywords that are likely to be blacklisted in China. The researchers have a list of 122 words thus far, but told eWEEK that the blacklist likely contains thousands.
Beyond a topological map of worldwide censorship, however, the researchers also plan to turn ConceptDoppler into a tool that will “spammify” blacklisted words, using the same techniques spammers use to evade filters by separating word characters or inserting random characters into words.
Click here to read more about how Microsoft bought China.
“Spammers show us the way,” said Earl Barr, a graduate student in computer science at UC-Davis whos also an author on a paper from the researchers thats titled “ConceptDoppler: A Weather Tracker for Internet Censorship.”
“We could find out what the best spamming program is out there—[say], some evil Hungarian [program], and use that spam tool for good now,” Barr said.
In that scenario, modules on Web sites would signal when theyre getting a connection from within China. Site operators who know they have blacklisted words in their content could then run their responses through the spammifying tool and then deliver into China content that escapes keyword filtering.
Many words and phrases on the blacklist are predictable, such as “Tibetan Independence Movement,” “Falun Gong,” “The right to strike,” “Tiananmen Square Hunger Strike Group” or “Voice of America.”
Some are surprising, such as “conversion rate,” “Mein Kampf” or “International geological scientific federation.” In some cases, their literal translation into Chinese characters look like possible spellings of other blacklisted words.
Toppling the Great Wall
For example, a Wikipedia article about a state in Western Germany, when translated into Chinese characters, uses some characters common also to the phrase “Falun Gong.” Crandall could only speculate as to why other phrases appear to be blacklisted, however.
“A friend of mine from China said they dont just block stuff they consider harmful to the government,” he said. “They block stuff considered … bad. That takes out most Web pages about World War II history.”
China is not alone in blocking Internet traffic. Canada and England block child pornography; Germany blocks Nazi-related material.
China is alone in conducting keyword filtering at this sophisticated level, however, Crandall said. While Iran conducts a simpler form of keyword filtering using Web proxy filtering, Chinas technique allows its routers to probe deep into each individual page and avoids the blocking of entire sites. This is a more blunt approach, Crandall said. For example, the word “massacre” appears on Chinas blacklist, which means any page that contains the word is off-limits.
But while Chinas keyword filtering techniques result in what is likely inadvertent blockage, from a censors point of view, its an elegant approach. The problem with blacklisting IP addresses, for example, is that someone can just mirror the content onto a different IP address, Barr said.
While Web proxying can deal with that evasion, this approach has scalability problems. Proxies force censors to run every piece of content over their systems, not only sucking up resources but creating a single point of failure. “Its very expensive to build proper capacity,” Barr said.
At any rate, proxies are in practice protocol-specific. They thus can be bypassed by users who agree to communicate on another port or to slightly modify the protocol.
The GFC is not only a more elegant approach thats harder to evade, its also more interesting to researchers in the information it surrenders. Namely, Chinas firewall tells researchers what its up to in the form of its reset packets, and monitoring them can be done entirely from outside of the country.
“You can do probing from entirely outside China because of the way keyword filtering works,” Crandall said. “We realized from outside China that we could 1) find out how many hops into China and where the routers are doing the filtering. We can modulate packets a certain way and look at packets that come back and know how many hops there were before it got to the router that did the resets. Also we can test words on the blacklist by sending a keyword, and if the reset comes back, you know its blocked.”
The researchers plan to get better measurement of Internet topography to figure out where keyword filtering is being done, and to use other source points—besides UC-Davis—from which to measure to refine their findings. China may now be using more sophisticated techniques still, such as IP tunneling. A better Internet topography could help the researchers determine whether thats the case.
At this point, using a single source is hampering the researchers as they try to figure out whos doing the blocking. What they do know is that Chinas largest ISP, ChinaNET, performed 83.3 percent of all filtering of their probes. They also know that 99.1 percent of all filtering occurred at the first hop past the Chinese border, that filtering occurred beyond the third hop for 11.8 percent of their probes, and that there were sometimes as many as 13 hops past the border before a filtering router was encountered.
What they do know: Other countries that engage in censorship are looking to copy Chinas techniques.
The researchers plan to present their work at the Association for Computing Machinery Computer and Communications Security Conference in Alexandria, Va., Oct. 29 – Nov. 2.
Check out eWEEK.coms Security Center for the latest security news, reviews and analysis. And for insights on security coverage around the Web, take a look at eWEEKs Security Watch blog.