Syncsort, a provider of big data and mainframe software, has upgraded its DMX-h data integration software to enable enterprise organizations to work with mainframe data in Hadoop or Spark in its native format.
Syncsort delivered the new capabilities because some of its large enterprise customers—particularly those in financial services, banking, insurance and health care—needed to maintain their mainframe data in its native format for compliance purposes, the company said.
Tendu Yogurtcu, general manager of Syncsort’s big data business, told eWEEK that while many of Syncsort’s large enterprise customers want the scalability and cost benefits of Hadoop and Spark for their mainframe data, converting that data for the big data platforms presents compliance challenges because they are required to preserve the data in its original EBCDIC format.
“With this announcement we are basically saying that many of our use cases, which are based on our collaboration with very large customers in financial services, banking and insurance, where regulatory compliance is very critical, need access to their mainframe data in native format,” Yogurtcu said. “They want to get access to this mainframe data. However, changing the data format can cause governance and compliance issues. This new feature involves making Hadoop understand this EBCDIC encoded mainframe record format.”
The mainframe data has to remain in mainframe format for audit purposes or for archival purposes, she said.
Yogurtcu said that last summer Syncsort open-sourced some Apache Spark packages and mainframe connectors to make mainframe data available for interactive queries as Spark SQL. “And all of these moves, until now, required that that mainframe data be EBCDIC encoded in mainframe-specific format to be translated into something an open system can understand,” she said.
The technology Syncsort open-sourced is an IBM z Systems mainframe connector for Apache Spark. The contribution enables enterprises to access and get new insights from their mainframe data with Apache Spark’s analytics capabilities and Spark SQL. Yogurtcu said Syncsort is betting that Spark will play a major role in next-generation use cases including the Internet of things. This added to the company’s push to transform mainframe data into a format that is easily understandable by Spark. Syncsort’s mainframe connector for Spark is similar to the Apache Sqoop mainframe connector that Syncsort released as open source in 2014.
Moreover, based on results of a survey from January of this year, Syncsort identified Spark as one of the key hot trends for 2016. According to the survey, nearly 70 percent of respondents said they are most interested in Apache Spark. Interest in Spark surpassed interest in all other compute frameworks, including the recognized incumbent, MapReduce, which was noted by 55 percent of respondents.
Syncsort Delivers Native Mainframe Hadoop, Spark Data
Yet, while Yogurtcu said Syncsort expects MapReduce will still be the prevalent compute framework in production, the high level of interest should translate into more Spark deployments, mostly running on Hadoop.
Apache Spark is an open-source data processing engine built for speed, ease of use and sophisticated analytics. It is designed to perform both batch processing and new workloads like streaming, interactive queries and machine learning. Spark and Hadoop are not competitors, as Hadoop does things that Spark doesn’t.
While many Hadoop vendors and users are replacing the MapReduce computation framework with Spark, there also is the Hadoop ecosystem as a whole, which includes the HDFS storage system and NoSQL key value stores like HBase. Spark doesn’t do storage; it only works with the existing storage system.
Last September, Syncsort announced the integration of the “Intelligent Execution” capabilities of its DMX data integration product suite with Apache Spark. Intelligent Execution enables users to visually design data transformations once and then run them anywhere—across Hadoop, MapReduce, Spark, Linux, Windows or Unix, on-premises or in the cloud, the company said.
To help mainframe users facing challenges getting data into Hadoop, Syncsort introduced its new high-speed DMX Data Funnel, which enables users to ingest hundreds of database tables at once.
“The second part of our announcement is about making access to mainframe data as simple as possible,” Yogurtcu told eWEEK. “Two of our big insurance customers had hundreds and hundreds of tables that they needed to transform data from. So we are shipping a tool called Data Funnel and with that you can access 800 to 1,000 tables at once. It parallelizes data access and brings all of these tables in parallel. Access to large volumes of data at once is the second part of our announcement. This is to increase productivity and improve development time.”
With the new Data Funnel, users can now take hundreds of tables, and in one step, load them into the Hadoop Distributed File System (HDFS), she said.
In addition, with new support for Fujitsu NetCOBOL, Syncsort supports both IBM z Systems and Fujitsu mainframes. This move comes in response to strong demand in the Asia Pacific and Central and Eastern Europe, Middle East and Africa (CEMEA) markets, the company said.
“Syncsort continues to leverage their mainframe and big data expertise to solve complex technology issues that prevent organizations from leveraging Hadoop and Spark to store, process and analyze their mainframe data,” said George Gilbert, lead big data analyst at Wikibon, in a statement. “Syncsort’s new features don’t require hard-to-find skills that companies don’t want to spend money and time to acquire.”