MySpace has both announced and open-sourced a new technology known as Qizmt, an internally developed framework for distributed computation based on the MapReduce framework.MySpace has both announced and open-sourced a new technology known as Qizmt, an internally developed framework for distributed computation based on the MapReduce framework.
Created by the Data Mining team here at MySpace, Qizmt can be used
for many operations that require processing large amounts of data such
as collaborative filtering for recommendations and analytics. MySpace
announced Qizmt on Sept. 15.
In a blog post about the technology, Mike Jones, chief operating officer at MySpace, said, "Qizmt is a powerful MapReduce-based
environment that enables MySpace user recommendation engines to become
smarter, faster and more reliable. Currently Qizmt is being used in the
People You May Know feature, and will soon enable us to expand user
recommendations to new areas."
Qizmt was developed using Microsoft technology, specifically the C# language. Said Jones:
"Qizmt is unique because it was developed using C#.NET specifically
for Windows platforms. This extends the rapid development nature of the
.NET environment to the world of large scale data crunching and enables
.NET developers to easily leverage their skill set to write MapReduce
functions. Not only is Qizmt easy to use, but based on our internal
benchmarks, we have shown its processing speeds to be competitive with
the leading MapReduce open source projects on a lesser number of cores."
The MapReduce software framework was developed by Google to support
distributed computing on large data sets on clusters of computers. The
framework is inspired by the map and reduce functions commonly used in
functional programming, Computational processing can occur on data
stored either in a file system or within a database. MapReduce
environments are used by sites with large amounts of data, such as
Google, Amazon and others
For his part, Jones said many companies leverage Microsoft
technologies in their business intelligence (BI) platforms and Qizmt is
a "natural extension" to these platforms. "As companies deal with
continued data growth and deeper analytics needs, Qizmt becomes a more
integral part of BI both from a data processing and a data mining
perspective," Jones said.
Describing how MySpace uses Qizmt, Jones added:
"MySpaces millions of users consume and produce video, music and
other content every minute, which constantly results in very large sets
of new data. Qizmt can process both data generated by our users (active
data) and data generated by our analytics system (passive data) and
transform it into meaningful recommendations virtually in real-time.
This will support the discoverability of new entertainment experiences
across music, videos, friends and more."