Data Storage Demands Complicated by Data Variety

A survey of data scientists reveals it is not only the volume of data but the different types of data collected that is causing storage headaches.

data storage and big data

Nearly three-quarters (71 percent) of data scientists said big data had made their analytics more difficult and data variety, not volume, was to blame, according to a survey of 111 data scientists from computational database management system designer Paradigm4.

The survey results indicated that it is the diverse types of data, not the volume, that's a bigger challenge to data scientists and is causing them to leave data on the table.

The report also revealed 39 percent said their job had become more stressful with the growth of big data, while nearly half of data scientists (49 percent) said they're finding it more difficult to fit their data into relational database tables.

"The increasing variety of data sources is forcing data scientists into shortcuts that leave data and money on the table," Marilyn Matz, CEO of Paradigm4, said in a statement. "The focus on the volume of data hides the real challenge of data analytics today. Only by addressing the challenge of utilizing diverse types of data will we be able to unlock the enormous potential of analytics."

The vast majority (91 percent) said they're using complex analytics on their big data now or plan to within the next two years, although the survey also showed that 36 percent of data scientists say it takes too long to get insights because the data is too big to move to their analytics software.

Finally, the report indicated that despite the hype around the Apache Hadoop software platform, fewer than half (48 percent) have used Hadoop or Spark, and of those, 76 percent said it was too slow, took too much effort to program or had other limitations.

Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware, and Spark is an open-source data analytics cluster computing framework.

A June report from TwinStrata also indicated businesses struggle to cope with the deluge of data flooding their storage systems, and management of inactive data (as defined by being unused for six months or more) proved to be an area where the greatest improvements can be made.

Analysis of the data revealed that the majority of organizations continue to use expensive primary storage systems to store infrequently accessed data, and as a result, respondents engaging in this practice spend significantly more of their annual IT budget on storage than their peers.

Three out of five organizations replace their storage systems within five years. When asked the top reasons for storage system replacement, growing capacity needs (54 percent), manufacturer end of life (49 percent), new technology (38 percent), and high maintenance costs (30 percent) were most cited.