To generate richer and more timely insights, enterprises are using increasing amounts of data. Expect that trend to continue. An IDC model projects that the global datasphere will roughly triple in size by 2025.
But this trend isn’t new. After all, the term Big Data has been with us for quite a while. What’s different is where the data will emanate from and how fluid it will be. In other words, mobile and IoT – the edge – will drive data creation.
Further, the processing and analysis will happen at various points from on device, at the gateways, and across the cloud. Perhaps a better term would be Fluid Distributed Data instead of Big Data?
Regardless, more data ultimately translates to more viable business opportunities – particularly given that this new data is generated at the point of action from humans and machines.
To take full advantage of the growing amounts of data available to them, enterprises need a way to manage it more efficiently across platforms, from the edge to the cloud and back. They need to process, store and optimize different types of data that’s coming from different sources with different levels of cleanliness and validity. They need to connect this data to internal applications and apply business process logic, increasingly aided by artificial intelligence and machine learning models.
It’s a big challenge. One solution enterprises are pursuing now is the adoption of a data fabric. And, as data volumes continue to grow at the network’s edge, that solution will evolve further into what will more commonly be referred to as an edge data fabric.
What is a Data Fabric
Data that is distributed across different areas can be accessed easily and transparently – in real time in a unifying data layer, under the same management – through a data fabric. The data fabric itself enables operators to move and access data across different deployment platforms, data processes, geographical locations and structural approaches.
Essentially, a data fabric acts as both the plumbing and translator for data moving onto and off different platforms – including data centers, the public cloud, private clouds and the many types of gateways and devices operating at the edge.
How Data Fabric Applies to Edge Computing
Edge computing provides a unique set of challenges for data being generated and processed outside the network core. The devices themselves operating at the edge are getting more complex.
Smart devices like networked PLCs manage solenoids that, in turn, control process flows in a chemical plant, pressure sensors that determine the weight and active RFID tags to determine the location of a cargo container. The vast majority of the processing used to take place in the data center, but that has shifted to the point where a larger portion of the processing takes place in the cloud. In both cases, the processing happens on one side of a gateway.
The data center was fixed, not virtual, but the cloud is fluid. If you consider the definition of cloud, you can see why a data fabric would be needed in it. Cloud is about fluidity and removing locality, but, like the data center, it’s about processing data associated with applications.
We may not care where the Salesforce cloud or Oracle cloud or any other cloud is actually located but we do care that my data must transit between various clouds and persist in each of them for use in different operations.
Because of all that complexity, organizations have to determine which pieces of the processing are done at which level. There’s an application for each, and for each application there’s a manipulation. And for each manipulation, there’s processing of data and memory management.
The point of a data fabric is to handle all the complexity. Spark, for example, would be a key element of a data fabric in the cloud, as it quickly has become the easiest way to support streaming data between various cloud platforms from different vendors.
The edge is quickly becoming a new cloud, leveraging the same cloud technologies and standards in combination with new, edge-specific networks such as 5G and WLAN 6. And, like the core cloud, there are richer, more intelligent applications running on each device, on gateways, and at what would have been the equivalent of data center running in a coat closet on the factory floor, in an airplane, on a cargo ship and so forth. It stands to reason you will need an analogous edge data fabric to the one that is solidifying in the core cloud.
Edge Data Fabric’s Common Elements
To handle the growing number of data requirements edge devices pose, an edge data fabric has to perform several important functions. It has to be able to:
- Access to many different interfaces: http, mttp, radio networks, manufacturing networks
- Run on multiple operating environments: Most importantly POSIX compliant
- Work with key protocols and APIs: Including more recent ones with REST API
- Provide JDBC/ODBC database connectivity: For legacy applications and a quick and dirty connection between databases
- Handle streaming data: Through standards such as Spark and Kafka
Edge Data Fabric is at an Inflection Point
While edge computing’s origins date back to content delivery networks in the 1990s, it’s starting to reach a market tipping point for an edge data fabric.
The key drivers for edge computing have changed. For us to truly harness all this intelligence and all this processing being done at the edge, we will have to shed the client-server mentality. The days of single-location data centralization are gone. The majority of data is going to stay at the edge.
As you get more intelligence at the edge, it’s directly executing automated routines. You’re directly embedding policy around that automation and directions for what should be done by the exception handling routines, which you iterate over time so that nothing has to happen manually or the process comes to a halt. You do that by having machine learning (ML) tied to whatever policy and the process is and how to handle the exceptions. That ML has to run in an unsupervised fashion at the edge.
Why not move it all into the cloud? It would take a lot of bandwidth to do that. What we’ve learned is every time there’s a jump from 2G to 3G to 4G to LTE to 5G, you get an accompanying bandwidth surge, and each time we’ll have all this new bandwidth, and each time you can do less and less in the cloud. Each time, you have more data outpacing the new bandwidth. Call it the bandwidth paradox.
Another reason to not move data back to the cloud: latency. Even if you could move all the data there, if you’re trying to execute an automated process, decisions have to be made in real time. Making that decision and sending that decision back to the point of action would create too much latency – even with the speed of 5G.
A third reason: privacy and security. With all the risks organizations face, the best thing to do is to build your historical baseline, run it locally, get rid of the back end historical baseline over time, keep it limited to the data you need, and throw away data as fast as you can. If you’re going to do that, why put everything up in the cloud anyway? Just do everything locally.
Edge Data Fabric Use Cases
An edge data fabric will provide support for open communities to build application functionality into what were previously closed networks and systems. These could include equipping 5G Wireless Networks with Multi-Access Edge Computing (MEC) to open up the network for third-party developers and integrators to build content delivery networks.
An edge data fabric also could unlock opportunities for a multi-layer IoT grid – with PLCs on one layer, machine vision on another and robotics on yet another layer – to share data between these layers. For third-party vendors to design and productize such a grid, an edge data fabric would have to be present.
Moving Cloud Data Fabric to the Edge
Of course, historical data from the edge will need to flow up-front to the ML algorithm developers for design, tuning, and drift adjustment. And key pieces of data from the edge such as financial transactions will flow to the core cloud just as pertinent but, again, relatively small data sets around, say, parts information or other scheduling to install them will flow from core systems to the edge.
This speaks to the fluidity of data and really the need to seamlessly connect the edge data fabric to the core cloud data fabric. Given that standards like MEC are now migrating cloud technologies to the edge, the prognosis for simply moving cloud data fabric to the edge look promising.
About the Author:
Lewis Carr, Senior Director of Product Marketing at Actian