MySpace has both announced and open-sourced a new technology known as Qizmt, an internally developed framework for distributed computation based on the MapReduce framework.
Created by the Data Mining team here at MySpace, Qizmt can be used for many operations that require processing large amounts of data such as collaborative filtering for recommendations and analytics. MySpace announced Qizmt on Sept. 15.
In a blog post about the technology, Mike Jones, chief operating officer at MySpace, said, "Qizmt is a powerful MapReduce-based environment that enables MySpace user recommendation engines to become smarter, faster and more reliable. Currently Qizmt is being used in the -People You May Know' feature, and will soon enable us to expand user recommendations to new areas."
Qizmt was developed using Microsoft technology, specifically the C# language. Said Jones:
""Qizmt is unique because it was developed using C#.NET specifically for Windows platforms. This extends the rapid development nature of the .NET environment to the world of large scale data crunching and enables .NET developers to easily leverage their skill set to write MapReduce functions. Not only is Qizmt easy to use, but based on our internal benchmarks, we have shown its processing speeds to be competitive with the leading MapReduce open source projects on a lesser number of cores.""
The MapReduce software framework was developed by Google to support distributed computing on large data sets on clusters of computers. The framework is inspired by the map and reduce functions commonly used in functional programming, Computational processing can occur on data stored either in a file system or within a database. MapReduce environments are used by sites with large amounts of data, such as Google, Amazon and others
For his part, Jones said many companies leverage Microsoft technologies in their business intelligence (BI) platforms and Qizmt is a "natural extension" to these platforms. "As companies deal with continued data growth and deeper analytics needs, Qizmt becomes a more integral part of BI both from a data processing and a data mining perspective," Jones said.
Describing how MySpace uses Qizmt, Jones added:
""MySpace's millions of users consume and produce video, music and other content every minute, which constantly results in very large sets of new data. Qizmt can process both data generated by our users (active data) and data generated by our analytics system (passive data) and transform it into meaningful recommendations virtually in real-time. This will support the discoverability of new entertainment experiences across music, videos, friends and more." "