IBM, AMD, ARM, Others Look to Unite CPUs, Accelerators

The group is creating the CCIX interconnect to enable faster processing of such emerging workloads like data analytics, machine learning and 5G.


IBM, ARM and Advanced Micro Devices are among the tech vendors teaming up to create a single data center interconnect fabric that will enable chips and accelerators from different vendors to communicate without the need for complex programming.

The new Cache Coherent Interconnect for Accelerators (CCIX) will make servers more efficient and better able to run such emerging data center workloads like big data analytics, machine learning, 4G and 5G wireless networking, video analytics and network-functions virtualization (NFV), according to officials with the companies involved.

Other vendors involved are Huawei Technologies, Mellanox Technologies, Qualcomm and FPGA maker Xilinx, which announced that the new product roadmap for its 16-nanomter UltraScale+ FPGAs will include offerings that include integrated High-Bandwidth Memory (HBM) and support for CCIX.

The CCIX fabric will allow the various chips, accelerators and networking silicon—from CPUs and graphics chips to field-programmable gate arrays (FPGAs)—to seamlessly move data between the systems powered by these technologies. The new workloads demand increasingly fast and efficient processing, and systems makers are turning to accelerators like FPGAs, GPUs and digital signal processors (DSPs) to work with CPUs—whether x86 chips from Intel and AMD or IBM Power processors or ARM-based systems-on-a-chip (SoCs)—to more quickly, efficiently and affordably run the applications.

A single interconnect fabric specification that enables processors that use disparate instruction set architectures to communicate will accelerate these capabilities.

"CCIX enables greater performance and connectivity capabilities over existing interconnects, and actually paves the road to the next generation CPU-accelerator-network standard interface," Gilad Shainer, vice president of marketing at Mellanox, said in a statement. "With an anticipated broad eco-system support of the CCIX standard, data centers will now be able to optimize their data usage, thereby achieving world-leading applications efficiency and scale."

Lakshmi Mandyam, director server systems and ecosystems at ARM, said in a statement that a "'one size fits all architecture' approach to data center workloads does not deliver the required performance and efficiency. CCIX enables more optimized solutions by simplifying software development and deployment of applications that benefit from specialized processing and hardware off-load, delivering higher performance and value to data center customers."

The idea of CCIX makes a lot of sense at a time when CPUs can no longer be counted on to be able to accelerate the performance of applications on their own, according to Karl Freund, a senior analyst with Moor Insights and Strategy.

"This will be no small task," Freund wrote in a column in Forbes. "It is hard enough to build a cache coherent interface between two or four homogeneous chips like CPUs. Building one that allows devices to share data across disparate implementations of CPUs, FPGAs, GPUs and network chips will be a monumental challenge. However the potential benefits could be tremendous if they can pull this off, providing plug-and-play compute and network acceleration for whatever processor you choose, while providing much better performance than is available today using the PCIe interconnect upon which today's system depend."

PCIe has been around for more than a decade, but was not made to address the high-bandwidth, low-latency demands needed for communications between CPUs, he wrote. What's needed is a bus or fabric with shared high-speed memory where the CPU and accelerators "all behave as first class citizens."

Vendors already have done work to improve communications between CPUs and accelerators. Intel last year bought FPGA maker Altera for $16.7 billion and is expected to create an architecture to enable that. In addition, IBM is using its Cache-Coherent Accelerator Processor Interconnect (CAPI) technology to improve connectivity between its Power8 chips and Xilinx's FPGAs, while Nvidia's NVLink is used to improve connectivity performance between IBM's Power architecture and its own GPUs.

However, the problem is that technologies like CAPI and what Intel is working on are vendor-specific, according to Freund. What's needed is a more collaborative approach.

"IBM OpenPOWER, Advanced Micro Devices x86, ARM partners AMD, Qualcomm and Huawei could each go it alone, or they can join forces," he wrote. "To not collaborate would cede a substantial advantage to Intel and risk fragmentation that the industry would not accept."