Subtree Debuts Dotmesh and Dothub for Cloud Native Data Management

The startup raises $10 million in a bid to bring more control to Docker and Kubernetes cloud-native application data management.

dotmesh

Container and cloud-native data management startup Subtree emerged from stealth on Feb. 7, alongside a $10 million seed round of investment and two initial products that bring data control to Docker and Kubernetes container environments.

Subtree is led by founder and CEO Luke Marsden, who had previously created the open-source Flocker container storage project and the associated company ClusterHQ, which ceased operations in December 2016. Subtree is tackling a different challenge than Flocker did, aiming to provide control for container data with its dotmesh and dothub efforts.

"With microservices, it's now very difficult to share the state of an application," Marsden told eWEEK. "So what we're saying with dotmesh is it would be great if it were possible to easily capture the state of multiple microservices that make up an application in a single atomic unit."

A key challenge with container applications, according to Marsden, is that they are typically deployed as multiple microservices. He noted that the concept of polyglot persistence is common with microservices, whereby each microservice that needs to have a stateful function has its own database. 

With dotmesh, a developer can take a snapshot of a container application at a given point in time. The snapshot, called a datadot, includes information on the state of all the different files and databases that make up the application. Marsden's goal is to have the dotmesh approach be similar to how developers work with the popular open-source git source code versioning system. With git, users can push, pull, fork and share code easily.

"The datadot that captures the state of an application can be pushed up to the datahub," Marsden said.

The datahub is the central hosted repository for datadots; it is a commercial service being offered by Subtree. Dothub has an initial free tier that provides up to 1GB of storage. as well as a $10-a-month tier that provides 5GB of storage and a Team tier that provides 10GB of storage for $20 a month. There is also open-source dotmesh hub code available for those who want to host and store the datadots on their own infrastructure.

Cloud Native

While dotmesh is intended to help enable data control in a cloud-native environment, the project is not currently part of the Cloud Native Computing Foundation (CNCF). The CNCF itself is a Linux Foundation effort and is home to Kubernetes and multiple other cloud-native open-source projects. Marsden said that while dotmesh is not currently part of the CNCF, the work his company is doing is sympathetic to the goals of the CNCF, which are to build cloud-agnostic software.

When it comes to container configuration and control, there are multiple tools that developers use today, including the Chef, Puppet and Ansible configuration management systems, as well as the helm project for Kubernetes package patterns. In Marsden's view, all those projects are in the infrastructure control space while dotmesh is really about data management.

"Data management is a different story, and it's about the volumes that you attach to container images," he said.

Marsden said Docker and Kubernetes both do a good job of forcing developers to define in application manifests the difference between where the application runs and where the associated data for an application runs. As such, a developer will define that everything running in a particular directory is a volume while everything outside of the directory is part of the immutable container image that is ephemeral and can go away if it is upgraded or migrated to a different system.

"That dividing line is the line between infrastructure management and data management," Marsden said.

Lessons Learned From ClusterHQ

Marsden's previous container effort with Flocker and ClusterHQ did not end successfully, but there were lessons learned from the experience that will help to inform his new company Subtree.

In a 2014 interview with eWEEK when Flocker got started, Marsden said the project was an effort to solve the data problem for Docker containers. As it turns out, Flocker was a bit too early in the market, which has now matured significantly since 2014.

"Flocker and ClusterHQ emerged at the very beginning of the container revolution," he said. "A lot of the work that Flocker had to do was very basic around simply container containers with storage."

Marsden said that one of the mistakes made with Flocker and ClusterHQ was believing there was market for the product before one truly existed. He added that by the time Kubernetes commoditized what Flocker had pioneered, the company was unable to pivot. At Subtree, Marsden said the company is taking a more agile approach and aims not to scale until there is a true market fit.

"We're being a lot more rigorous about not believing hype and being extremely reflective about market fit and when it really is time to start scaling," Marsden said.

Sean Michael Kerner is a senior editor at eWEEK and InternetNews.com. Follow him on Twitter @TechJournalist.

Sean Michael Kerner

Sean Michael Kerner

Sean Michael Kerner is an Internet consultant, strategist, and contributor to several leading IT business web sites.