Its not about the blogospheres grass roots getting little price tags hung on them, such as America Online Inc.s purchase of Weblogs Inc.
Its about infrastructure thats sagging.
“Theres a lot of hype regarding how much the blogosphere has grown, and it has, and a lot of it is legitimate blogs with real content. But theres a scandal under the surface,” Mike Graves, chief architect of VeriSigns Naming and Directory Services, told eWEEK.com.
“Google and a lot of free services have millions of spam blogs out there that are increasingly choking up bandwidth but also making it hard to find content [users] want,” Graves said.
Statistically, VeriSign and other companies its talked to are finding the problem is bigger than just the size of traffic. Some ping services would say youre looking at 20 to 30 percent pollution. Other places spike as high as 50 to 70 percent at some times.
Its easy to see. Weblogs.coms front page, after the poor, creaking thing manages to load, is crammed full of junk: free info about http://student_loan, Free info about http://fight.search, FREE info at http://az21amazingdeal—it goes on and on.
“We think parts [of the blogosphere] will break because of load and growth,” Graves said.
The increasingly crushing load is due both to legitimate blog traffic and to a burgeoning amount of spam blogs—aka splog.
“For a long time, ping servers could be stood up as a single box running on a fast business DSL connection,” Graves wrote on VeriSigns Infrablog.
“Last Thursday weblogs.com processed just under 2 million (1.96M) pings for the day,” Graves wrote. “When we started talking with Dave [Winer, owner and founder of Weblogs.com], a couple months back, the ping totals were barely half of that, and the load even then on the servers made pinging Weblogs a chancy proposition during peak posting times.”
Indeed, Graves wrote, the days of running a ping server on a single box with a fast connection have passed, at least for popular ping servers, and pings are well on their way to requiring serious infrastructure.
Nobody knows that better than services that rely on the ping service, such as PubSub Concepts Inc. or Technorati Inc. As far as PubSub is concerned, the infrastructure of the blogosphere has to be supported properly to get to the next step in its evolution.
PubSub is a matching service that notifies subscribers when new content is created that matches their subscriptions. With its proprietary Matching Engine, PubSub reads millions of data sources to do so.
“This is a very positive thing,” said Salim Ismail, PubSub CEO and co-founder. “Spam is now becoming a problem. … As of late, the [Weblogs.com] site hasnt been working well. What [VeriSign] will do is rebuild it.”
Creating a Solid Foundation
for Web 2.0″>
A rebuilt Weblogs.com and a host of other infrastructure improvements now being worked on by the companies in this space are essential to PubSubs entry into the next step in the blogospheres evolution, wherein PubSubs recently announced Structured Blogging format will make it easier to publish and find information on the Web.
Structured Blogging lets users add different styles and tags to each type of blog entry that they post. These styles and tags ensure that movie and book reviews dont look like plain text but instead show up as calendar or journal entries, and that each content type can rely on XML to be quickly recognized and processed by automated search services and other applications.
At any rate, the consensus is that VeriSign is the perfect pick for the job of rebuilding the foundations on which such a new blogosphere will grow. “VeriSign, at heart, is sort of a Web infrastructure company,” said Marc Strohlein, vice president and lead analyst for Outsell Inc.
“I think theyve clearly … looked at blogs and RSS and said, Its valuable and important technology, but out of the box, it isnt terribly scalable,” Strohlein said. “They sense theres a business opportunity in making it more secure, more robust and more industrial-strength, so it can be used in broader ways than it has.”
VeriSigns strategy makes sense because it owns all the PKI and domain addressing technology already, Strohlein said. “Theyre right in the thick of that.”
The fact that Weblogs.com has been creaking under the load wasnt news to Winer. In December 2004 he posted a plea for help in rewriting the code for Weblogs.com.
“With Typepad, MSN Spaces and Blogger and a gazillion other blogs pinging weblogs.com, the server, which is written in scripts, has met its match,” Winer said. “Its needed a rewrite in C for some time, now it really needs a rewrite.”
VeriSign knows its stuff, but how exactly will it stem the rising tide of splog? Graves told eWEEK.com that the company plans a three-pronged approach: through contextual analysis, authentication and heuristics that can trace splog to the tools commonly used to spawn it.
He pointed to Google Inc.s Blogger.com as being a good example of search and textual analysis tools that quickly filter splog.
“If you go to Blogger.coms front page and view a random blog, if you click through that random blog, you generally get very good quality blogs,” he said. “Theyre all readable, done by humans, made for human consumption.
“On the back side, we see pings from Blogger.com, and an enormous number are splogger pings,” Graves said. “Google obviously has a filtering mechanism in its own perimeter. They use search and textual analysis tools, quickly and fairly accurately, Id say. Youd likely find two or three out of 50 that are spam blogs.”
VeriSign is working on similar analysis tools. One tactic is to look at the content of a post: Whats the subject matter? Does it seem to have been lifted as a block of text from a post? Is it attempting to get readers to click on a form of solicitation? Most blogs do have links—how do splog links differ?
Obviously, textual analysis is not an easy technique, but it has proved to work.
The second technique is to analyze a blogs origin. VeriSign is looking at identity authentication using domain keys, a PKI standard that Yahoo Inc. came up with and Google and other ISPs adopted to use for e-mail authentication. E-mail is, after all, the original patient suffering from the problem of spam, Graves pointed out.
As it now stands, blog protocols are so simple, its childs play to write a blog and send a ping that claims the post is from Gizmodo or some other popular blog. While its not that difficult to intercept such an imposter, users may see it as a legitimate update to a popular blog, click on it and find themselves at an online poker page, for example.
Thus, VeriSign will be looking at supplying a lightweight digital hash or signature for blog posts. Instead of having a simple blacklist of sploggers, the company could set up and maintain a credit score for bloggers, for example.
Privacy guardians take note: Authentication doesnt mean personally identifiable information. “Weve become very keen on making sure we take pains to differentiate that identity is one thing, but authentication and credentials is another,” Graves said. “We can provide identity tools. They can be anonymous. … Theres no privacy issue involved. Identity is a way to tell A from B, just to tell that A is not B. … As long as you know Im not somebody Im trying to pose as, thats very useful information.”
Heuristics is the third tool. Reverse-engineering tools can look at blogs to determine whether theyve been created by specific tools from the splogger community.
“Theyre fairly sophisticated, template-driven to provide a specified blog with appropriate posts and pages, a thousand at a time,” Graves said. “But inevitably they leave some kind of structure, some kind of signature behind, that you can say, Ah ha, that has the fingerprint of tools out there that create these blogs.”
Its all very community-minded, but of course VeriSign has plans to make money off the Weblogs buy.
While basic pings processed by Weblogs.com will remain free to submit and retrieve, VeriSign plans to layer paid services on top, Graves said. Over time, the company will add value-added services to publishers and consumers, much in the same way that Yahoo provides basic e-mail for free and offers extra storage, domain hosting or integrated Web sites for additional fees.
From that point looms the enterprise market—and therein lies perhaps one of the biggest lures when it comes to getting the blogosphere plumbing done right—if not for VeriSign, then for the services who are already angling to catch that fish.
“NewsGator has a big footprint in aggregating for the enterprise,” Graves said. “[At this point], its like selling e-mail without spam filtering. Enterprise is holding it at arms length and will continue to do so until theres some improvement in quality.”
Check out eWEEK.coms for more on IM and other collaboration technologies.