If I told you that a major browser maker has created a metadata search engine that tracks how Web pages are built, you’d readily assume such an offering was cooked in Mozilla Labs.
Mozilla Labs, after all, is known for its frequent browser plug-ins to augment Firefox, with projects such as Snowl, Ubiquity and Geode garnering attention in the past few months.
Yet it’s also-ran browser maker Opera that is behind MAMA (Metadata Analysis and Mining Application), a search engine that pores over 3.5 million Web pages to index the markup, style, scripting and technology used to craft Web pages.
While Google, Yahoo and other general search engines help you find content based on text, MAMA answers such questions as, “How popular is Flash?”, “Can I get a sampling of Web pages that have more than 100 hyperlinks?” or “What does an average Web page look like?”
This type of information has more value for browser makers and standards bodies, for whom the structure of the Web is crucial.
MAMA is the brainchild of Brian Wilson, a QA (quality assurance) tester at Opera. Wilson told me MAMA’s roots date to 2004, when he and his team were looking for samples of certain types of code.
Test cases he created in QA were fine, but he preferred to see how developers were doing things in the real world, which could only be done effectively by trolling the Web. However, as programmers can attest, there has been little in the way of effective data about the state of the Web. Wilson devised MAMA to fill that void.
“We noticed the solution basically resembles a search engine, except that instead of the content on the Web page, it looked for all the markup and script components,” Wilson told me.
What Does MAMA Mean for Programmers?
Wilson built the search engine to answer general questions such as “how many sites are mobile-ready?”, “how many sites use CSS (Cascading Style Sheets)?” or “how many markup errors does the average Web page have?” Wilson said MAMA enabled him to prioritize bugs and justify adding support for new technology to help Opera make “product decisions based on what it could tell customers about what the Web actually looks like.”
Wilson also envisions MAMA will be useful to Web standards bodies, such as the W3C (World Wide Web Consortium), giving Web authors an important voice. The W3C can use the data to measure the adoption rates of technologies.
MAMA is hardly ready for prime time. Wilson has not set a time frame for a launch other than in the coming months. He told me he needs to improve MAMA’s scalability performance, which wouldn’t satisfy the majority of QA testers.
He also wants to integrate new features programmers request and do a full recrawl of MAMA, which is the last domino to fall before Opera spins out a public version of the search engine.
What’s interesting to me about MAMA is that it’s a project that won’t conflict with Google, Yahoo or Microsoft in top-line search. I can’t imagine any search vendor would harass Opera into a sale.
If MAMA becomes a smash in the programming community whenever it’s released, I would expect a software vendor with a QA bent, such as IBM’s Rational group or HP’s Mercury team, to come calling for a possible buy. Most likely, it will be left alone.
MAMA is part of a crop of search engines targeting a specific information niche on the Web. Paglo and Splunk, for example, are search engines that helps IT find the tools they need to improve their jobs.