IBM has developed a new approach to storage known as cognitive storage, where computers can be taught to learn the difference between high value and low value data.
In essence, the concept helps computers to learn what to remember and what to forget, IBM said.
In a blog post from today as well as a in a new paper that came out today in the IEEE journal Computer, IBM Research scientists Giovanni Cherubini, Jens Jelitto and Vinodh Venkatesan introduced the concept of cognitive storage. As a research project, IBM’s cognitive storage initiative is not available yet, but it could be very soon, the company said.
What cognitive storage does is break down the difference between what the human brain would view as memories and what it would view as information. This differentiation can be used to determine what is stored, where it is stored and for how long, IBM said.
“With rising costs in energy and the explosion in big data, particularly from the Internet of Things, this is a critical challenge as it could lead to huge savings in storage capacity, which means less media costs and less energy consumption,” the IBM post said.
The idea is based on a metric known as data value, which is analogous to determining the value of a piece of art—the higher the demand and the rarer the piece typically means it will have a higher value, requiring tight security, IBM said.
“The concept of cognitive storage goes beyond existing approaches not only by starting from a data perspective and taking workload characteristics into account, but also by introducing the value of data as a main determinant for storage configuration and management, data placement, data protection, and data lifecycle management,” the IBM researchers said in their paper. “This concept allows us to design an elastic and dynamic storage system that is capable of storing data more efficiently by providing high redundancy only for the most relevant data and by saving storage space by storing less relevant data with reduced redundancy. The value of data, besides its popularity and workload characteristics, may also be used to determine the level of service provided.”
For example, IBM said if 1,000 employees are accessing the same files every day, the value of that data set should be very high. A cognitive storage system would learn this and store those files on fast media like flash. In addition, the system would automatically back up these files multiple times, IBM said. Lastly, the files may want to have extra security so they cannot be accessed without authorization, the post said.
In reverse, the opposite also is true. A data set that is rarely accessed, like PDF files of 20-year-old tax documents, should be stored on cold media like tape and only available upon request, IBM said. A cognitive storage system would also know that tax records need to be kept for at least seven years and that they can be deleted after that period, IBM said. In addition, in many situations, data value can also change over time and a cognitive storage system can also adapt, the post said.
“We talk about big data being the world’s newest natural resource and like precious metals, the price fluctuates, well so does the value of the data,” said Vinodh Venkatesan, a data scientist at IBM Research, according to the IBM post. “But the challenge is coming up with the right value. The price of gold is modeled after the financial markets, but how do we determine the value of a spreadsheet within an enterprise?”
To determine this value, IBM tracked the access patterns of data or the frequency it is used. The researchers also added metadata tags to the data to help train the system, depending on the context in which the data is used. For example, an astronomer may tag a data set coming from the Andromeda galaxy as highly important or less important. In fact, astronomy is what inspired IBM scientists to come up with the idea of cognitive storage, the post said.
Yet, “the establishment of cognitive storage will depend on the ability of storage and information scientists to identify the principles underlying a broad definition of the value of data, and on the emergence of truly data-centric storage/file systems,” the IBM researchers said in their paper. “The emerging field of Infonomics, which provides a framework for how to assert economic significance to information, is a promising approach to define data value in a business value context. However, is also a broader data value definition possible that takes into account aspects such as the subjectivity of the value of data, its context and cultural dependence, its time dependence, and also aspects of exclusivity and fairness?”
Moreover, for his part, Charles King, principal analyst at Pund-IT, said IBM’s cognitive storage leverages continuing, automated analysis of users’ behavior to determine whether data or files are “hot” — meaning they need to be supported with high performance storage systems/media — or cold — meaning they can be archived in lower cost/performance arrays.
“So if large/growing numbers employees or customers are reading a report or or downloading a particular file, the cognitive system will keep it readily available,” King said. “After its popularity wanes, it can be consigned to the storage equivalent of a deep freeze. In essence, IBM is aiming to replace traditional storage tiering solutions — which typically use data preferences/classifications that are fixed/managed by storage admins — with a cognitive solution that is sensitive/responsive to end user behavior/preferences, and can also be programmed to “retire” data according to predetermined guidelines. It’s an interesting approach that leverages both IBM’s long term storage R&D and its growing cognitive assets.”