SHARE

IBM Big Data Platform Adds Hadoop, Analytics Advancements

Written By

Apr 4, 2013

4 minute read

eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

IBM announced two new technologies to help enterprises tackle big data by making it simpler, faster and more economical to analyze massive amounts of data.

At an event at its Almaden Research Center in San Jose, Calif., on April 3, IBM announced BLU Acceleration and the IBM PureData System for Hadoop.

BLU Acceleration represents the work of hundreds of IBM developers and researchers in labs around the world and combines a number of techniques to dramatically improve analytical performance and simplify administration, IBM said. BLU Acceleration enables users to have much faster access to key information, leading to better decision making.

The software extends the capabilities of traditional in-memory systems—which enables data to be loaded into random access memory (RAM) instead of hard disks for faster performance—by providing in-memory performance even when data sets exceed the size of the memory. During testing, some queries in a typical analytics workload were more than 1,000 times faster when using the combined innovations of BLU Acceleration, IBM said.

Innovations in BLU Acceleration include “data skipping,” which allows the ability to skip over data that doesn’t need to be analyzed (for example, if there is duplicate information); the ability to analyze data in parallel across different processors; and greater ability to analyze data transparently to the application, without the need to develop a separate layer of data modeling. Another innovation in BLU Acceleration is called “actionable compression,” where data no longer has to be decompressed to be analyzed.

BNSF Railway Company, one of the largest freight rail transportation networks in North America, is using IBM BLU Acceleration to more quickly understand the vast amounts of data from the organization’s 1,700 servers that tracks maintenance, weather, scheduling, inventory, safety, deliveries and more. BNSF operates more than 1,400 trains a day on 32,500 route miles of track in 28 states and two Canadian provinces.

“BNSF transports many of the products and materials that we use every day in America and around the world, so tracking of these shipments is critical to our organization,” Kent Collins, a database solutions architect at BNSF, said in a statement. “Working with IBM we are now embracing our organization’s big data with the power of analytics. Thanks to the new technology, we’re performing tasks more quickly than ever before, for example, one of the queries improved over hundredfold, and our storage consumption went down by about 10 times. One of the things that impressed us the most about BLU Acceleration is its simplicity. We just load the data and run queries.”

“Big data is about using all data in context at the point of impact,” Bob Picciano, general manager of IBM Information Management, said in a statement. “With the innovations we are delivering, now every organization can realize value quickly by leveraging existing skills as well as adopt new capabilities for speed and exploration to improve business outcomes.”

IBM’s Big Data Platform Adds Hadoop, Analytics Advancements

Meanwhile, the new IBM PureData System for Hadoop is designed to make it easier and faster to deploy Hadoop in the enterprise. Hadoop is an open-source software framework used to organize and analyze vast amounts of structured and unstructured data, such as posts to social media sites, digital pictures and videos, online transaction records and cell phone location data.

The new system can reduce from weeks to minutes the ramp-up time organizations need to adopt enterprise-class Hadoop technology with easy-to-use analytics tools and visualization for both business analysts and data scientists. In addition, it provides enhanced big data tools for monitoring, development and integration with many more enterprise systems.

IBM PureData System for Hadoop is a key step in IBM’s overall strategy to deliver a family of systems with built-in expertise that leverages its decades of experience reducing the cost and complexity of information technology. This new system integrates IBM InfoSphere BigInsights, which enables companies to cost effectively manage and analyze data and add administrative, workflow, provisioning and security features.

Kelley Blue Book, which provides new and used car information, will be evaluating the new PureData System for Hadoop to analyze clickstream data created by users on its Website. The company will be able to analyze this information, including social media data, to see what topics visitors care most about, such as used and new vehicle prices, accident reports, safety recall and warranty data, and car shopper reviews.

“Kelley Blue Book collects all kinds of data from various sources, so managing the efficiency of data is critical to grow our business,” Steve Chow, vice president of technology and data intelligence for Kelley Blue Book, said in a statement. “We see many opportunities to leverage IBM’s offering as a strategic platform to expand on our analytic ecosystem and tap the value of social media, text and machine data to get a better view of our consumers and customers to improve their overall experience on KBB.com.”

Overall, the IBM Big Data Platform combines traditional data warehouse technologies with new big data techniques, such as Hadoop, stream computing, data exploration, analytics and enterprise integration, to create an integrated solution to address these critical needs.

IBM’s announcement also included the following new versions of IBM’s big data solutions:

1. A new version of InfoSphere BigInsights, IBM’s enterprise-ready Hadoop offering, makes it simpler to develop applications using existing SQL skills, compliance security and high availability features vital for enterprise applications. BigInsights offers three entry points: free download, enterprise software and now an expert integrated system, IBM PureData System for Hadoop.

2. A new version of InfoSphere Streams, unique “stream computing” software, enables massive amounts of data in motion to be analyzed in real time, with performance improvements, and simplified application development and deployment.

3. A new version of Informix includes TimeSeries Acceleration for operational reporting and analytics on smart meter and sensor data.

All the offerings will be available in the second quarter of 2013, except the PureData System for Hadoop, which will start shipping to customers in the second half of 2013.

IBM Big Data Platform Adds Hadoop, Analytics Advancements

Darryl K. Taft

Company

Categories