2012: A Cloudy Year for Big Data

"Convergence" is the word as big data moves further into the cloud and into the reach of small and midsize enterprises.

Truth be told, big data is not a big or new concept.

The ideology behind big data has been around since the early days of mainframes and scientific computing. What is new about big data is the term itself, which has become part of the nomenclature of today's business speak. Still, for most of its existence, big data has been out of the reach of small and midsize businesses (SMBs) because the storage and processing power needed to make this technology work is too expensive.

The cloud is changing that by bringing the necessary big data components to the masses in the form of hosted solutions. These new cloud-based capabilities are on a growth path and are creating more opportunities for even the smallest of businesses to leverage big data without the traditional expenses of compute farms and massive storage arrays.

Big data analytics comprises three primary elements: volumes of unstructured data, processing power and algorithms. Naturally, the biggest challenge for SMBs is the data itself-finding it, storing it and accessing it.

For it to be true big data, there has to be lots of it, and most SMBs don't generate that volume of data internally, which leads them to seek out alternative data sources. Here, the cloud delivers.

There are several large public data sets that are readily available, containing all types of information, including data from the U.S. Census Bureau, the World Bank and general public data from Google.

Additional data is available from several government agencies, such as Data.gov, while data-focused sites that span everything from Web traffic to social networking can be found in the likes of Crunchbase.com, Kasabi.com, Freebase.com, Infochimps.com and Kaggle.com. These Websites offer a variety of data types for use in analytics.

Throughout 2012, those data sets and others can be expected to grow exponentially. The amount of data being generated globally increases by 40 percent a year, according to the McKinsey Global Institute, a data analytics research firm.

However, data is only part of the equation. All this information needs to be organized, sorted and processed, and that takes computing power.

Once again, cloud services can deliver those capabilities. A key example is Amazon's Cluster Compute, a cloud-based supercomputer that offers this service.

Amazon isn't the only one in the game: Companies such as IBM and Hewlett-Packard are offering private cloud-based big data analytics platforms. However, since this technology is designed as a complete platform and not as a service, these platforms are still out of the reach of the SMB market.

Other companies are looking to fill that void by offering on-demand analytic solutions that can process big data and deliver results quickly and inexpensively. A case in point is Aster Data, which offers a cloud-based, on-demand analytics platform, along with appliance-based and software analytics products. Another company looking to bring big data analytics into the cloud is 1010Data, which has developed a completely hosted big data analytics platform.

Still other firms are developing the momentum to convert big data analytics into cloud services. The most notable of these ventures is Splunk, which is known for software that analyzes large volumes of machine data. The company is currently working on Splunk Storm, a data analytics platform designed for cloud developers to build multitenant solutions. That way, the high costs of big data analytics can be spread out among multiple customers, creating an economy of scale that will increase in affordability over time.