Cray Unveils Open Appliance to Enterprise Analytics Workloads

The company's Urika-GX system marries Cray's supercomputing technologies, like the Aries interconnect, with open enterprise tools like Hadoop.

Cray analytics

Cray officials are using the company's deep expertise in supercomputing to help enterprises address the challenges of big data analytics.

The systems vendor on May 24 announced the Urika-GX, a supercomputing system that comes preloaded with open enterprise analytics software like Hortonworks Data Platform, which includes Hadoop and Apache Spark, and the OpenStack management suite.

The result is a system that has the performance capabilities to process and analyze the massive amounts of data being generated and the tools that enterprises are accustomed to using, according to Ryan Waite, senior vice president of products at Cray.

"There's so much data that's being created, and being able to search through all this data is hard," Waite told eWEEK.

Cray is no stranger to data analytics. The company over the past few years has rolled out a couple of systems—the Urika-XA and Urika-GD appliances—aimed at the high-performance computing (HPC) space. HPC organizations—such as the national laboratories—have deep expertise in both technology and the ability to integrate them, he said.

However, enterprises don't tend to have the same integration capabilities and often, when they buy big data systems, are stuck with having to do the integration work themselves, Waite said. They need appliances that are pretested and preintegrated with the software, he said. Enterprises can have the new system up and running within days.

"Those [HPC] customers love getting under the hood," Waite said. For enterprises, "with Urika-GX, we tend to do a lot of [the integration work] for them."

Enterprises also are struggling with the data center sprawl being created by the increasing compute demands of the complex analytics workloads—what Waite called "franken-clusters." Companies build one cluster for one data analytics task and another cluster for a second one and a third for a third workload. The results are inefficient data centers populated by multiple compute clusters.

The Urika-GX is designed to address the issue by leveraging Cray's Aries supercomputing interconnect technology with other Cray offerings—including its cluster architecture, the Cray Graph Engine and the preintegrated and open features of the Urika-XA appliance—to run multiple disparate workloads within the same appliance, he said.

The new system, which will be available in the third quarter but is being beta tested by some customers now, is powered by Intel's Xeon "Broadwell" chips—up to 1,728 cores per system—and includes 22 terabytes of memory, 35TB of solid-state drive (SSD) storage and the Aries interconnect, which officials said delivers the network performance needed for the most complex analytics workloads. Initial configurations will offer 16, 32 and 48 nodes in a standard 42U, 19-inch rack. Larger configurations will come in the second half of the year. It supports such analytics tools as Cassandra, Kafka and Lustre, and offers Cray's Sonexian as a storage option.

Waite said Cray is targeting two key customers with the new appliance. One is the data scientist, who is looking for the best products for their complex workloads.

"They're constantly exploring whatever are the latest, most novel tools to run analytics," he said.

The second are IT teams, which are the ones having to deal with the data center sprawl and integration challenges.

The Urika-GX is the latest Cray system to embrace such open software as Hadoop, Apache and Mesos for dynamic configuration. It's also the first Cray system to use OpenStack, though Waite said that in the future, all Cray systems will be built to support OpenStack.

Moving into the enterprise space gives Cray access to an expanded base of potential customers. Waite said the company is looking to target a range of markets, including financial services, engineering, life sciences and cyber-security, where the need to quickly analyze large amounts of data is acute.