eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.
110 Benefits of Analyzing Data at the Edge in an IoT Environment
A tremendous volume of data is created at the sources in most internet of things (IoT) environments. Historically, organizations wanting to immediately analyze all that data from their IoT sources had few options—each with major drawbacks. The use cases for IoT continue to grow, and in many situations, the volume of data generated at the edge requires bandwidth levels that can overwhelm available resources. Computation and analysis of IoT data close to the sources are critical because they allow more efficient and faster decision-making locally while also enabling subsets of the data to be reliably transported to a central analytics deployment. Things are getting more—not less—distributed. In this eWEEK slide show, using industry information from MapR strategist Jack Norris, we share the benefits of serving analytics at the edge in an IoT environment.
2Faster Decision-Making
By putting analytical processing at the data source, you can take specific actions on a wide variety of events. In situations where even a few minutes of delay in response can be costly, an immediate response is vital. For example, if you have tunable cell phone antennas that can be repositioned to target short-term hot spots, providing optimal coverage 5 minutes late is too late. By then, customers have determined your coverage is inadequate. Real-time analytics at the data source will allow you to respond within narrow time windows.
3Space Constraints
Many IoT data sources have space limitations that make it a burden to deploy a set of full-sized hardware servers. Depending on the processing requirements, it may be impossible to run servers at remote locations, such as in cars or on medical devices. You need a system that can run efficiently on small hardware footprints like today’s minicomputers.
4Overcome Bandwidth Constraints
Some IoT environments such as oil wells and connected vehicles generate a significant amount of data that overwhelms bandwidth. This means delivering all data to a central location for analysis is impractical. By putting analytics at the edge, you can reduce bandwidth requirements by not completely relying on the delivery of data to a central analytics cluster. With edge processing, you can also apply strategies such as summarizing, down-sampling and/or compressing data prior to its transmission back to a primary analytics cluster.
5Reliability
Edge deployments are typically in remote locations and are therefore much less accessible than on-premises or cloud deployments. Should any failure occur in the analytics system, replacement or recovery is much more difficult. Therefore, reliability strategies become even more important to ensure your edge deployments face minimal downtime. Not all technologies are suitable for that type of challenge. You need a system that uses redundancy and failover, even in remote and space-constrained locations, to ensure continuity.
6Selective Processing
The huge volumes of data that collect at edge sources are not all valuable. If you can quickly isolate interesting data from the mundane, you can better isolate data for rich analytics, as well as reduce the overall data storage and transmission requirements. For example, the most meaningful data in a test run of a self-driving car is the data collected around the time a driver has to intervene. This represents an anomaly that should be analyzed in detail. For the periods where the self-driving car runs fine, the collected data is not as valuable.
7Security
If you are analyzing data at the edge, you have many of the same data management concerns as you have elsewhere. Therefore you need to protect your data—either from theft or from malicious corruption. If you are generating sensitive data, then the risks of a breach are obvious. If hackers modify your data, then the corruption can lead to incorrect insights that hurt your organization. They key here is deploying an edge system that has the level of data protection that you would have in a typical data center without any tradeoffs.
8Location Restrictions
If you face regulations or any other strict policies around where data must be stored, then having analytical capabilities at the data sources makes sense. Also, you can take advantage of the compute power at the edge to anonymize or mask personally identifiable information (PII) so it can be safely delivered to another location while complying with regulatory frameworks.
9Cost
The cost of the technologies that create data in consumer IoT devices such as smart thermostats or wearables is low. This is necessary to ensure that the economics of IoT make sense. If you intend to install compute power for analytics at the edge, the costs become much higher compared with the data-creation mechanisms like the sensors. Your edge-deployed analytical system does not have to be as inexpensive as a commodity sensor device, but it does have to be cost-effective. Your choice of technologies should take full advantage of low-cost hardware so that the cost of deploying to hundreds or thousands of sites does not overwhelm your budget.
10Avoid Data Storms
You might have data sources that will create a load spike due to unforeseeable events, such as a natural disaster. This can potentially lead to a data storm that overloads your entire IoT network. Having edge analytics lets you minimize the impact of such load spikes. By offloading some analytics processing to the edge, you can reduce the risk of data storms that may shut down your system.
11Administrative Complexity
When deploying analytics at edge sites, you create a network of distributed data centers that will create management overhead. It would be far more efficient to manage a single deployment at the home base, but only if that is practical. With analytics clusters spread across many locations, it is important to have technologies that are architected for distribution and support the notion of a single, global name space. This helps to simplify the management of many data sources as if they were simply parts of a single cluster.