If IT were a television show, it would be “Hoarders.” Organizations are creating and storing more and more data every day, and they’re having a difficulty finding effective places to put it all.
In fact, according to research by IDC, by 2020 we will hit the 44 zettabyte mark, with about 80 percent of the data not in databases. With such unprecedented data growth, IT teams are looking for flexible, scalable, easily manageable ways to preserve and protect that data. This is where object storage shines.
Object storage (also known as object-based storage) is a storage architecture that manages data as objects, as opposed to other storage architectures such as file systems, which manage data as a file hierarchy and block storage, which manages data as blocks within sectors and tracks. Each object typically includes the data itself, a variable amount of metadata, and a globally unique identifier.
Companies that specialize in—or at least offer—object storage options include Cloudian, Pure Storage, Digital Ocean, IBM/Cisco, Dell EMC Virtustream, Spectra Logic, SwiftStack, Qumulo, Minio, NetApp, Hitachi Data Systems, Cohesity and Veritas, among others.
Michael Tso, CEO and Co-Founder of Cloudian and a man who knows his market well, provided eWEEK some industry information on exactly why he believes object storage systems are the most efficient for big data-type workloads—including those that run machine learning and artificial intelligence use cases—that are becoming more common all the time.
Here are eight specific storage requirements of these data sets, and why AI and ML applications demand the data management capabilities supplied by enterprise object storage solutions.
Storage Requirement No. 1: Scalability
AI systems can process vast amounts of data in a short timeframe. Furthermore, larger data sets deliver better algorithms. The combination drives significant storage demands. Microsoft taught computers to speak using five years of continuous speech recordings. Tesla is teaching cars to drive with 1.3 billion miles of driving data. Managing these data sets requires a storage system that can scale without limits.
How Object Storage Helps: Object storage is the only storage type that scales limitlessly within single namespace. Plus, the modular design allows storage to be added at any time, so you can scale with demand, rather than ahead of demand.
Storage Requirement No. 2: Cost Efficiency
A useful storage system must be both scalable and affordable, two attributes that don’t always co-exist in enterprise storage: historically, highly-scalable systems have been more expensive on a cost/capacity basis.
How Object Storage Helps: Object storage is built on the industry’s lowest cost hardware platform. Add in low management overhead and space-saving data compression features, and the result is 70 percent less cost than traditional enterprise disk storage.
Storage Requirement No. 3: Software-defined Storage Options
Vast data sets will sometimes require hyperscale data centers with purpose-built server architectures already in place. Other configurations may benefit from the simplicity of pre-configured appliances.
How Object Storage Helps: Object storage keeps your deployment options open, with your choice of storage appliances or software-defined storage.
Storage Requirement No. 4: Hybrid Architecture
Different data types have varying performance requirements, and the hardware must reflect that. Systems must include the right mix of storage technologies to meet the simultaneous needs for scale and performance, rather than a homogeneous approach that will fall short.
How Object Storage Helps: Object storage employs a hybrid architecture, with a spinning disk for user data and SSDs for performance-sensitive metadata, thus optimizing cost and performance.
Storage Requirement No. 5: Parallel Architecture
For data sets that grow without limits, a parallel-access architecture is essential. Otherwise, the system will develop choke points that limit growth.
How Object Storage Helps: Object storage employs a shared-nothing cluster architecture, which means that all parts of the system work in parallel. Data throughput grows continuously as the system expands.
Storage Requirement No. 6: Data Durability
Backing up a multi-petabyte training data set is not feasible; it would usually be cost and time prohibitive. But you can’t leave it unprotected, either. Instead, the storage system need to be self-protecting.
How Object Storage Helps: Object storage is designed with redundancy built-in, so data is protected without requiring a separate backup process. Furthermore, you can select the level of data protection needed for each data type to optimize efficiency. Systems can be configured to tolerate multiple node failures or even the loss of an entire data center.
Storage Requirement No. 7: Data Locality
While some training data will reside in the cloud, much of it will remain in the data center for a variety of reasons: performance, cost, and regulatory compliance are three of them. To be competitive, on-premises storage must offer the same cost and scalability benefits of its cloud-based counterpart.
How Object Storage Helps: Object storage is the storage of the cloud. It’s used by many cloud providers for use as public cloud infrastructure. Cloud scalability and economics are now available on-premises.
Storage Requirement No. 8: Cloud Integration
Regardless of where data resides, cloud integration will still be an important requirement for two reasons. First, much of the AI/ML innovation is occurring in the cloud. On-premises systems that are cloud-integrated will provide the greatest flexibility to use cloud-native tools. Secondly, we are likely to see a fluid flow of data to/from the cloud as information is generated and analyzed. An on-premises solution should simplify that flow, not limit it.
How Object Storage Helps: Object storage should be cloud-integrated in three ways: First, solutions may employ the S3 API, the de-facto standard language of cloud storage. Secondly, they may facilitate tiering to/from Amazon, Google, and Microsoft public clouds, and let you view local and cloud-based data within a single namespace. Thirdly, data stored to the cloud should be accessible directly from cloud-based applications. This bi-modal access lets you employ both cloud and on-prem resources interchangeably.
Realizing the full potential of AI/ML requires an infrastructure that supports innovation. Today’s object storage solutions should deliver the scalability, cost efficiency and interoperability that enhances the capabilities of these emerging technologies.