Popularity comes with price, as Web publishers of XML syndication feeds are learning the hard way.
According to feed publishers, as the use of Really Simple Syndication news feeds grows so too does the bandwidth they consume and the demands they put on Web servers. Some Weblogs and technology Web sites are rethinking the way they publish their RSS feeds as they find that frequent requests from newsreaders, the applications that aggregate feeds, can strain their resources.
“Publishers are being caught off guard by how popular and how fast this stuff grows,” said Greg Reinacker, president and founder of NewsGator Technologies, of Denver, Colo., a newsreader developer. “Its sort of one of the prices you pay for being able to notify users quickly when new content is available. Theres a bandwidth cost to pay.”
The debate over RSS bandwidth impact reemerged in the past two weeks after Microsoft Corp. started tinkering with its Microsoft Developers Network RSS feeds. Initially, the Redmond, Wash., software maker scaled back the content of its feed that had provided the full postings from its 967 bloggers.
The feed went from full text to a limit of 500 characters per posting earlier this month, catching the attention of bloggers and developers, some of whom decried the change. One of Microsofts own technology evangelists, Robert Scoble, wrote in his blog that “RSS is broken.”
Early reports pointed the blame at bandwidth, but Sara Williams, MSDN product unit manager, said the cause instead was a reevaluation of how to efficiently serve such a large, and ever growing, feed.
“All of MSDN is a rounding error compared to all of the downloads on Windows Update,” Williams said of the bandwidth. “The increased traffic we get because of RSS is completely negligible.”
But the increased traffic did raise eyebrows. Williams said the companys Web-hosting operations group had noticed that the file size for the MSDN blog page, being served by the full-text feed, had reached 400KB, a number far outside the typical range for a Web page.
That led MSDN to rethink its rollup RSS feed. Along with the main MSDN feed of all blog postings, each blog also has its own feed.
Then last week, following calls from developers and bloggers to bring back full text, MSDN reversed course. The aggregated feed again is providing the full blog postings, though MSDN did continue to keep summaries, now up to 1,250 characters, for the feed that appears on the blogs.msdn.com site.
In the longer term, though, MSDN is looking to better segment the feed into topical ones so that developers can subscribe to a subset of blog postings on a topic like security or Visual Basic, Williams said. The way RSS is distributed and read also needs to evolve.
“There are a bunch of opportunities to be smarter in the way clients ping servers for updates and the way servers cache RSS information as far on the edge as they can,” Williams said. “Interesting innovations can happen, but at a high level the technology is further ahead than the tools.”
There is no single solution to the potential bandwidth bottleneck of RSS, experts say. Some, like Williams, suggest distributing RSS feeds throughout the Internet. Others want newsreaders to be more stringent in limiting the frequency of polling feeds. All seem to point to problems in RSS implementations as the culprit.
“The problem is that the aggregators, the most popular ones, let users poll whenever they want, and thats not fair,” said Dave Winer, a co-author of the RSS format and publisher of the Scripting News blog. “Content developers need to have a say in that.”
Most newsreaders are set by default to check every hour for updates, but most also allow users to change the interval to as little as every few minutes or seconds. Some even set lower intervals by default.
“It just takes one poorly written newsreader to go out there and every 2 seconds query the page, and suddenly the Web server cant serve the normal page,” said Patrick McGovern, director of SourceForge.net, one of the Open Source Technology Groups developer Web sites.
Dealing With the RSS
SourceForge.net and its news cousin Slashdot.org are on the lookout for overly zealous newsreaders as a way to combat bandwidth hogging by RSS. Slashdot.org has an aggressive policy where it will ban newsreaders that check for its feeds more than once every half hour.
SourceForge.net, on the other hand, watches RSS traffic as part of its overall check for abuse. If any IP address is checking too many pages in too short of a time, it could be banned. McGovern declined to offer details on the threshold for banning but said the problem is not RSS itself.
What is needed is better caching of news feeds so that they are not constantly being pulled from the single server of the publisher, he said.
“We like RSS feeds because theyre a great way to get our content out to more sites and get more traffic to our sites,” McGovern said.
Winer also suggested that one component of the RSS 2.0 specification already allows better cooperation between publishers and newsreaders. Called “time to live,” or ttl, the element lets a content publisher tell a newsreader how often it should check for fresh content, Winer explained.
However, the ttl element is based on the honor system, Winer said. For it to be successful, a newsreader must abide by the time-interval set in the ttl tag. Winer said one way to enforce it could be for a publisher to give newsreaders a period of time, say 60 days, in which to support the interval or face being banned.
While a useful tool, ttl has other limitations, Reinacker said. It only covers one XML syndication specification, RSS 2.0, and not the other RSS varieties or the Atom format.
Another way feed publishers can limit too much polling is the use of HTTP caching headers, where sites can notify readers whether any new content has been published since the last check, Reinacker said. NewsGator, and most major newsreaders, support HTTP caching headers, he said.
NewsGator also sets updating by default at every hour but lets users change the setting. Reinacker said that newsreaders have to be careful about limiting users ability to poll more frequently for updates, or they could undermine the value of RSS. Some enterprises, for example, are using RSS for real-time notifications.
“Publishers are paying a bandwidth cost right now, but in doing so theyre delivering content in a way that hasnt been done before,” Reinacker said. “They are able to notify users of changes to their site as opposed to waiting for them to come to their site.”
Winer agreed that publishers have to expect to pay some cost for the extra bandwidth of RSS traffic. While he doesnt think that most blogs and publishers are facing an undue bandwidth burden, he said that the full industry does need to discuss a more collaborative solution.
“Its basically when this industry is ready to face these issues that its going to happen,” Winer said.
Until then, perhaps those most effected by RSS bandwidth hogging wont be large Web sites such as MSDN but smaller bloggers who develop a decent following, such as Gary Lawrence Murphy.
Murphy, who publishes the blog Teledyn, last November began noticing traffic spikes from his RSS feed. It reached a point where he was exceeding his ISPs daily bandwidth limit under his hosting plan.
He also noticed that despite using HTTP caching, many of the implementations of newsreaders didnt follow the specification correctly, leading him to both “break” the way his feed delivered its time stamp and to pare it down by removing HTML and limiting posts to 180 characters.
It stemmed the bandwidth tide, but Murphy said it should serve as a warning to other smaller blogs like his.
“Any site that becomes popular is going to be killed by their RSS,” Murphy said.