Edge Analytics – a necessary ingredient in the Internet of Things (IoT)

Depending on whose market research you believe and how you count, in a few years there will be tens of billions of smart and connected devices (“Things”), maybe more. The expectation is that it will not stop there and the growth will continue fueled by the appetite for expanding the IoT – after all, the migration to IPv6 will increase the number of available IP addresses to a whopping  (enough to address every single grain of sand on Earth and still have a LOT left over). These devices contain sensors and/or actuators, some computational capabilities, a power source and a connection (direct or indirect) to the Internet; a smart and connected device can therefore include everything from a real nifty thermostat to a smart phone and even a car (if it is connected, which is becoming more prevalent).

Until very recently, the focus was on making the devices, working out how they connect to the Internet, collecting the data they produce and making this data available for someone to use – a sort of “if you build it (the IoT) they will come”. While collecting the data and transporting it somewhere (for example, “The Cloud”) is a basic requirement, the real value comes from analyzing the data to gain some understanding as to what it all means. A stream of temperature measurements from a sensor in a house or on the street, has limited usefulness – we need to process and analyze this data so that we can decide if we need to turn on the air conditioning, for example. Hence, the IoT requires analytics and so the emerging zoo of analytics that has emerged from the Big Data market – predictive analytics, streaming analytics, real time analytics, etc. – are now being adopted by the IoT industry. Indeed the IoT is producing Bigger and Bigger Data covering the same 3 V’s – Volume, Variety and Velocity – as all other forms of Big Data.

So, we have a LOT of smart devices collecting a LOT of data and therefore requiring a LOT of analytics (and corresponding compute, storage and networking infrastructure). The traditional, centralized approach where all the data comes to a relatively small number of data centers where the data gets analyzed (“Big Data Analytics in the Cloud), will simply not scale and will also be very costly (e.g., transporting bits from here to there actually costs money). What is the alternative? Distribute the analytics! Consider, the smart devices have compute capabilities built into them (that’s why they’re smart); in fact, compute hardware is cheap – for literally a few dollars you can get significant computational power, which, in turn, can run analytics on the device itself. Furthermore, multiple devices are usually connected to a local hub/router/gateway where potentially more compute power is available (like Cisco’s IOx), enabling more complex multi-device analytics. Analytics are therefore running on or near the edge (of the network) and delivering Distributed Intelligence.

We can now think of a hierarchy of analytics corresponding to the available compute capabilities – “simple” analytics on the smart device itself, more complex multi-device analytics on the IoT gateways and finally the heavy lifting, the Big Data Analytics, running in the cloud. This distribution of analytics, offloads the network and the data centers creating a model that scales. Related terms have come up recently, such as Fog Computing and Cloudlets (related to mobile computing), also introducing the notion of a hierarchy of computing from the edge to the Cloud.

Edge Analytics is not only about creating a distributed, scalable model for IoT analytics – it has additional benefits:

  • Many business processes do not require “heavy duty” analytics and therefore the data collected, processed and analyzed by Edge Analytics on or near the edge can drive (automatic) “decisions” without ever needing to travel to a remote data center. For example, a local valve can be turned off when Edge Analytics have indicated a leak. This is a sort of “closing the circle” on the edge.
  • Some actions need to be taken in real time (i.e., they can not tolerate any latency between the sensor-registered event and the reaction to that event) – a situation that arises in industrial control systems – so there is literally no time to transmit the data to a remote Cloud.
  • Some sensors create a LOT of data – video being the primary one. Even when compressed, a typical video camera can produce a few megabits of data every second. Transporting these video streams requires bandwidth and bandwidth costs money and, if in addition, you want some quality of service guarantees it becomes even more expensive. Thus, performing video analytics on the edge and transporting only the “results” is much cheaper. If some of the video needs to be stored, depending on the volume, it could be stored near the edge or in a local (on premise) data center.

Since I am a firm believer in “No Free Lunch”, there has to be some price to pay…

So here is what we are trading off –

Ideally, you would want ALL of the data to process and analyze, as there may be relationships, patterns and correlations that could only be seen when using ALL of the data. Edge Analytics is all about processing and analyzing subsets of all the data collected and then only transmitting the results – we are essentially discarding some of the raw data and potentially missing some insights. The question is can we live with this “loss” and if so how should we choose which pieces we are willing to “discard”.

Before answering this question, let me point out that this topic is related to Big Data in general – given that sense making cannot keep up with the amount of data collected (the so called Big Data Gap), should we collect/store/analyze EVERYTHING? Or maybe we should be prudent and deal with Smart and Relevant Data. Edge Analytics is one way of making decisions about what we keep and analyze, but here too, the question is how do we decide what needs to be kept and analyzed? What is Relevant?

The answer to this question is highly domain dependent – some organizations (e.g., militaries) may not be willing to lose ANY data and for them Relevant Data is ALL data. This leads me to conclude that domain expertise is required to make to decision what can be handled at the edge with the associated loss of underlying data. Domain experts will be those who know a priori that some pieces of data from different locations are unlikely to be related and therefore can be dealt with locally and possibly acted on locally. Sure, you can always argue that there may be new, unknown, surprising correlations that even domain experts are unaware of and, yes, we may lose these insights. Hence, no free lunch

I should also mention that distributed systems have a long and rich history and we should keep in mind some of the lessons learned when dealing with such systems. For example, when many “islands” are analyzing and acting on the edge, it may be important to have somewhere a single “up-to-date view”, which in turn, may impose various constraints (for those interested in some of the more technical details and not faint of heart, see the CAP Theorem). The fact that many of the edge devices are also mobile complicates the situation even more. After all, most of us carry an edge analytics device on us (smart phone).

Here is the upshot of this discussion –

If you believe that the IoT will expand into every nook and cranny of our lives, where every conceivable object we interact with will become smart and connected, in the home, at work, in industry, indoors and outdoors, then distributing the analytics and the intelligence is inevitable. That’s a good thing – it will help us in dealing with Big Data and releasing bottlenecks in the networks and in the data centers. However, it will require new tools when developing analytic-rich IoT applications – what is the best way of allocating the analytics between the edge and the cloud? How do we keep data consistent? How do we handle mobile edge analytic devices traveling in a highly dynamic environment?

Some domains already employ solutions that are essentially doing some analytics on the edge (e.g., adaptive traffic signal control), but not on the massive scale that will result from the rapid expansion of the IoT. While some people may associate the term “Distributed Intelligence” with apocalyptic visions a la Skynet in “The Terminator” movies, I believe that some real cool and exciting stuff is ahead of us and (excuse the cliché) it will change our lives for the better. Read less