Opera 'MAMA' Search Engine Strikes Dulcet Tones for Web Developers

Web browser maker Opera releases MAMA, a search engine that, unlike general engines from Google or Yahoo, finds information about the structure of Web pages. This is a useful application development tool for quality assurance testers who want to know how Web services are composed. MAMA could be attractive to companies selling QA tools, such as IBM or HP, but it will likely remain a niche tool from Opera.

If I told you that a major browser maker has created a metadata search engine that tracks how Web pages are built, you'd readily assume such an offering was cooked in Mozilla Labs.

Mozilla Labs, after all, is known for its frequent browser plug-ins to augment Firefox, with projects such as Snowl, Ubiquity and Geode garnering attention in the past few months.
Yet it's also-ran browser maker Opera that is behind MAMA (Metadata Analysis and Mining Application), a search engine that pores over 3.5 million Web pages to index the markup, style, scripting and technology used to craft Web pages.
While Google, Yahoo and other general search engines help you find content based on text, MAMA answers such questions as, "How popular is Flash?", "Can I get a sampling of Web pages that have more than 100 hyperlinks?" or "What does an average Web page look like?"
This type of information has more value for browser makers and standards bodies, for whom the structure of the Web is crucial.
MAMA is the brainchild of Brian Wilson, a QA (quality assurance) tester at Opera. Wilson told me MAMA's roots date to 2004, when he and his team were looking for samples of certain types of code.
Test cases he created in QA were fine, but he preferred to see how developers were doing things in the real world, which could only be done effectively by trolling the Web. However, as programmers can attest, there has been little in the way of effective data about the state of the Web. Wilson devised MAMA to fill that void.
"We noticed the solution basically resembles a search engine, except that instead of the content on the Web page, it looked for all the markup and script components," Wilson told me.