Avi Kivity is well-known in the open-source and Linux communities as the original lead developer of the widely deployed KVM hypervisor. In 2012, Kivity started a company called Cloudius Systems, which develops the OSv operating system for the cloud. Today, Cloudius is being rebranded and refocused under the name ScyllaDB.
ScyllaDB is a new NoSQL column store database that is compatible with the Apache Cassandra database APIs. As such, the intention is that ScyllaDB can be used as a drop-in replacement for Cassandra, offering users the benefit of improved performance and scale.
“We’re building a really fast database for NoSQL workloads,” Kivity told eWEEK. “ScyllaDB is 100 percent compatible with Cassandra, and applications will run up to 10 times faster.”
The primary innovation in ScyllaDB is how the architecture of the NoSQL database is built. A common approach in multiple types of databases is to “shard” the data, that is to have data in multiple slices, or shards. With ScyllaDB, rather than just sharding data, the database itself is sharded.
“Cassandra already uses sharding with each node already responsible for a subset of the data,” Kivity said. “What we’re doing is we’re applying the sharding idea inside of the node itself.”
Kivity explained that in ScyllaDB there is a node that is composed of multiple CPU sockets, where each CPU has multiple cores. Each core in a ScyllaDB deployment can handle a subset of the data, enabling very efficient processing.
“By making each core to be responsible for a subset of the data, we are able to minimize communication between CPU cores, improving the performance of the node,” Kivity said.
At Cloudius Systems, the core product was the OSv operating system for the cloud, which can be used as the basis to run ScyllaDB, though Kivity noted that it’s not required.
“For bare metal deployment there is no need for OSv as we have a kernel bypass technology that allows ScyllaDB to directly control the NIC [Network Interface Card] from the application,” Kivity explained. “ScyllaDB can run on both bare metal and virtual deployment; it doesn’t require OSv, and we can achieve a high degree of performance on Linux itself.”
The Intel DPDK (Data Plane Development Kit) provides a mechanism to control a network card directly from an application. What ScyllaDB is doing is using an open-source framework called SeaStar that makes use of DPDK. SeaStar helps to enable multi-queue communications, effectively splitting up the network card into multiple segments, which further enables ScyllaDB’s sharding.
“So each CPU core gets its own network queue to the network card, which also helps us to enable high performance,” Kivity said.
While ScyllaDB is aiming to be a drop-in replacement for the Apache Cassandra database, Kivity emphasized that the plan is to keep pace with the Cassandra APIs and make use of standard drivers.
“We do plan to play well with the community and not try to break up the protocol and APIs,” Kivity said. “We want it to be simple for users so they don’t have to worry about compatibility.”
Sean Michael Kerner is a senior editor at eWEEK and InternetNews.com. Follow him on Twitter @TechJournalist.