MapR appears to be the new traffic cop at the intersection of batch analytics, real-time data flow processing and new-gen enterprise storage. That’s a pretty good place to be standing, considering that many major IT trends now touch all of those components.
The San Jose, Calif.-based company on Dec. 8 introduced a converged data platform, MapR Streams. It integrates Apache Hadoop batch storage and in-memory analytics with global event streaming, real-time database capabilities, and enterprise storage, enabling customers to harness the efficient usage of their data.
If you think this product would make a great Christmas present, keep in mind that this is simply an announcement; the product won’t become available until early next year.
Apache Spark has been the missing link to speeding up Hadoop workloads. It brings fast, in-memory data processing to Hadoop. It also features development APIs in Scala, Java, R, and Python to enable data workers to efficiently execute streaming, machine learning or SQL workloads for fast iterative access to big data sets.
For its part, MapR Streams converges enterprise data-in-motion with data-at-rest to enable complete administrative control. No other platform provider offers this capability at this time.
Installing MapR Streams into an existing converged data platform enables organizations in any industry to continuously collect, analyze and act on streaming data. Examples of this include advertisers providing relevant real-time offers, health care providers improving personalized treatment, retailers optimizing inventory, and telecom carriers dynamically adjusting mobile service areas.
“The most significant part of this is that instead of making this a standalone product, we’ve integrated this into the MapR platform,” MapR Chief Marketing Officer Jack Norris told eWEEK. “This is not just for convenience or ease of administration; it’s really to compress that day-to-day action cycle. It’s for IT to build applications that impact business as it happens.
“The data stream, in fact, becomes a system of record,” he said.
IT trends in 2015 and 2016 helped IT managers realize that enterprises must improve their responsiveness to critical events using continuous analysis of big data. Just as continuous iteration of software is fast becoming a standard practice, so is real-time — or near real-time — data analysis.
“Even as traditional workloads become progressively more optimized, entirely new ones are arising– notably from the emergence of IoT data flows–that will dwarf previous volume and velocity demands and demand new stacks of hardware, networking and especially information management technology for processing, securing and distributing new varieties of data,” Gartner Research noted in a recent report.
MapR Streams can easily scale to handle massive data flows and long-term persistence while providing enterprise features such as high availability (HA), disaster recovery (DR), security, and full data protection, Norris said. Connecting Apache Hadoop and Spark with a top-ranked NoSQL database and continuous, reliable streaming with global scale is a huge step forward in enabling enterprise developers to create the next-gen apps using big data, Norris said.
Using MapR Streams, developers have a continuous, converged and global data platform, allowing them to:
–easily build scalable, continuous high-throughput streams across thousands of locations with millions of topics and billions of messages;
–unite analytics, transaction, and stream processing to reduce data duplication, latency, and cluster sprawl while using existing open source projects like Spark Streaming, Apache Storm, Apache Flink, and Apache Apex;
–enable reliable message delivery with auto-failover and order consistency;
–ensure cross-site replication to build global real-time applications, and
–provide unlimited persistence of all messages in a stream.
MapR works closely with numerous leading technology partners, such as data Artisans, Databricks, DataTorrent, StreamSets, and Syncsort, to provide users with the flexibility to choose the components they want in their real-time analytics data platforms.
MapR Streams will be generally available in early 2016, Norris said.