Syncsort's Hadoop ETL Solutions Provide Simplified Data Integration

Syncsort announces two new data integration solutions offer better Hadoop with enhanced extract, transform and load (ETL).

Syncsort, a provider of big data integration and protection solutions, recently announced the availability of its Spring '13 release, including two brand-new Hadoop products and enhancements to its DMX technology that turn Hadoop into an easy-to-use extract, transform and load (ETL) solution.

Big data is prompting organizations to look at Hadoop to process more data in less time and for less money, but Hadoop is not yet a complete ETL solution. Syncsort's two new offerings for HadoopDMX-h ETL Edition and DMX-h Sort Edition are designed to strengthen Hadoop by providing the full functionality required to deliver enterprise ETL capabilities. They provide greater ease-of-use and maximize node performance, compared with non-native, code-generating ETL tools. In addition, performance and connectivity enhancements to DMX expand usage by end users and partners.

"Analyzing big data is critical to our customers' ability to sustain competitiveness, but the avalanche of information is breaking traditional data integration architectures—many of the tools are too code- and resource-intensive and ultimately drive costs too high," said Josh Rogers, senior vice president of the data integration business at Syncsort, in a statement. "With our new DMX editions, we are strengthening Hadoop by providing seamless and powerful ETL and sort capabilities and at the same time, reinvigorating the value proposition of ETL by leveraging the power of Hadoop to scale core processing of big data."

"Based on the evidence I have gathered talking with customers and in-the-weeds big data consultants, claims that Hadoop, and some non-Hadoop big data solutions, eliminate the need for ETL are patently false," wrote analyst Evan Quinn in a post on the Enterprise Strategy Group (ESG) blog. "Nothing solves data prep and understanding challenges like ETL. ETL forces the data analyst to dig into the details of all the raw data, and conceptualize what a perfect data set for analytics would look like—and this exercise also helps the data analyst determine the analytical possibilities. … Thus, it should also come as no surprise that ETL has thus far proven to be one of the most popular applications of Hadoop, and, if anything, ESG sees Hadoop-based ETL continuing to grow its fan base."

Moreover, Quinn added, "Syncsort DMX-h ETL Edition will help Hadoopists take a big data step forward in terms of ETL ease of development and performance."

"Cloudera sees ETL as one of the top use cases for Hadoop—it is essential to our mission of maximizing the value of big data," Amr Awadallah, chief technology officer at Cloudera, said in a statement. "We see Syncsort's new DMX-h offerings enabling our mutual customers with critical data integration and ETL capabilities which simplify ETL deployments while efficiently processing data natively on Hadoop. The CDH 4.2 release includes Syncsort's contribution to Apache Hadoop making the sort phase pluggable, enabling DMX-h, and broadening use cases on Hadoop."

The new DMX-h solutions take advantage of Syncsort's recent contribution to Apache Hadoop, which provides a unique level of native integration to deliver best-in-class data integration capabilities and Sort acceleration for Apache Hadoop distributions.

Highlights of the DMX-h ETL include an ETL engine that runs natively within MapReduce, maximizing node performance. It also provides Hadoop ETL without coding. Developers can leverage an easy-to-use Windows GUI and deploy seamlessly into Hadoop. In addition, it provides "use case accelerators," which essentially is a library of pre-built templates, that help developers fast-track Hadoop ETL implementations, and it extends access to and delivery of all data, including from the mainframe.

Recent Syncsort benchmarks show significant Hadoop performance and resource efficiency improvements when using DMX-h. The results show very predictable and sustainable throughput even as data volumes grow. Using the TeraSort benchmark, DMX-h Sort Edition achieved a sustainable throughput of more than 100MB per second per node, delivering upwards of two times higher throughput per node­ than Hadoop's native sort at 45MB per second per node.