How and Why Big Data is Fast Becoming Small Data Sprawl

Edge devices will gather, analyze and store information about the user, their environment and their response to it. The result is that more of your information will be in devices that you can (and cannot) see. This is “small data sprawl”–where slices of information about you will be spread all over the environment. It's already happening.


Organizations want to move closer to their customers. Proximity enables them to be more responsive and personal. It also allows them to have more control over the relationship. Because IoT devices now how have enough capability to solve real problems, every company is building an edge strategy.

Data is at the heart of any edge computing strategy. The edge devices will gather, analyze and store information about the user, their environment and their response to it. The result is that more of your information will be in devices that you can (and cannot) see. This is “small data sprawl”–where slices of information about you will be spread all over the environment.

Over the coming years, the market will shift its focus from big data to small data sprawl. Big data was easier to control, manage and analyze. It was stored in a central data lake, where data custodians secured the data and a handful of data scientists acted on it. Small data sprawl increases the value and the risk associated with your data. Companies, regulatory bodies and especially individuals need to prepare for a world of small data sprawl.

Industry information for this eWEEK Data Points article comes from Druva Chief Technologist Stephen Manley.

Data Point No. 1: The number of edge devices (especially IoT) is exploding

Analysts believe there are about 20 million edge devices in the world, and the number is growing exponentially. While most people think of smart meters, cars and wearables, IoT and edge have spread to every industry. Farmers, medical device manufacturers and manufacturers constantly gather telemetry; governments, casinos and retail companies do the same with video.

Data Point No. 2: The amount of data those devices generate is growing

Engineers and scientists always want more data. Even if they can’t use it now, they want historical data to mine in the future. Therefore, the amount of data generated per device is skyrocketing. Video and audio, already more data intensive than telemetry, are growing with higher-definition. Telemetry devices, not to be left behind, are generating more data that is gathered more frequently. Cars already generate 25GB/hour, and that number is increasing.

Data Point No. 3: Initial processing has to be done at the edge

Edge devices are becoming full-fledged computers because the initial processing has to be done locally. If you are automatically steering a car or controlling a pacemaker, you cannot depend on a slow, unreliable network. If you want to identify crimes or environmental issues, you cannot wait for central processing resources. Therefore, real-time computing will be done on the edge device itself. The result of this is small data sprawl – data living everywhere.

Data Point No. 4: Machine learning needs to be done at the center

Edge devices can execute algorithms, but machine learning can only be done at the center. To learn, the systems need access to a complete set of data, across many devices. They also need to apply more compute resources for a longer period of time. The edge will optimize for streaming; the center will optimize for analytics, enrichment and learning. That means the edge will need to send data to the center.

Data Point No. 5: Retention needs to be done at the center

IoT data is subject to compliance that requires consolidation of data and control. Organizations know they need to secure, retain and delete data (e.g. private data). They will also, however, need to preserve the algorithms that generated conclusions about the data. From an algorithm trading stocks to a medical device adjusting insulin levels to a camera identifying a potential crime, courts will expect organizations to be able to reproduce their results. That will require both the original algorithm and the original dataset. This is just the beginning. As AI becomes more widespread, regulators will pay more attention.

Data Point No. 6: Cloud will be the target for consolidating small data

Organizations will consolidate edge data in the cloud, because it enables both machine learning and compliance at scale. Cloud is the only place with enough power, capacity and accessibility to store the data. It enables customers to apply powerful analytics tools, so they do not need to scour the market for scarce data scientist resources. Finally, cloud offers a centralized view across regional data centers, so cloud teams can centrally manage of the data to comply with local regulations.

Data Point No. 7: Horizontal SaaS solutions will manage and classify small data

Customers will look to broad SaaS solutions to help manage and classify their data. Every industry will need to consolidate, protect and secure their data. They will also need to identify data (e.g. private data) that needs to be anonymized or purged. The combination of the scale of data and complexity of evolving regulations in multiple locations will convince organizations to offload the work to SaaS experts. These tools will gather the data, protect it, classify private data and help their customers find what they need, when they need it.

Data Point No. 8: Vertical solutions will run analytics on small data

Every industry and every organization will want to do something different with their data. That’s where they will create their competitive advantage. Therefore, we’ll see a rise of industry-specific SaaS tools for data analytics that companies augment with homegrown algorithms running on the cloud AI/ML infrastructure. By offloading the common infrastructure tasks, the leaders will focus their energy on building their secret sauce.

IoT and edge computing are transforming virtually every industry.

By moving closer to customers and employees, organizations unlock new opportunities to improve user experience and productivity. At the same time, the resulting small data sprawl demands a re-examination of how to manage and use data. Real-time analytics will move to the edge. Data protection and compliance will move to SaaS cloud applications. That will free up companies to build data and algorithmic enrichment using cloud tools.

As consumers, however, it will be critical to make sure that data privacy and compliance regulations keep pace with the new technology. Understanding the change is the first step.

If you have a suggestion for an eWEEK Data Points article, email