EDGE ANALYTICS

Edge Analytics – a necessary ingredient in the Internet of Things (IoT)

Depending on whose market research you believe and how you count, in a few years there will be tens of billions of smart and connected devices (“Things”), Read more..

Edge Analytics – a necessary ingredient in the Internet of Things (IoT)

Depending on whose market research you believe and how you count, in a few years there will be tens of billions of smart and connected devices (“Things”), maybe more. The expectation is that it will not stop there and the growth will continue fueled by the appetite for expanding the IoT – after all, the migration to IPv6 will increase the number of available IP addresses to a whopping  (enough to address every single grain of sand on Earth and still have a LOT left over). These devices contain sensors and/or actuators, some computational capabilities, a power source and a connection (direct or indirect) to the Internet; a smart and connected device can therefore include everything from a real nifty thermostat to a smart phone and even a car (if it is connected, which is becoming more prevalent).

Until very recently, the focus was on making the devices, working out how they connect to the Internet, collecting the data they produce and making this data available for someone to use – a sort of “if you build it (the IoT) they will come”. While collecting the data and transporting it somewhere (for example, “The Cloud”) is a basic requirement, the real value comes from analyzing the data to gain some understanding as to what it all means. A stream of temperature measurements from a sensor in a house or on the street, has limited usefulness – we need to process and analyze this data so that we can decide if we need to turn on the air conditioning, for example. Hence, the IoT requires analytics and so the emerging zoo of analytics that has emerged from the Big Data market – predictive analytics, streaming analytics, real time analytics, etc. – are now being adopted by the IoT industry. Indeed the IoT is producing Bigger and Bigger Data covering the same 3 V’s – Volume, Variety and Velocity – as all other forms of Big Data.

So, we have a LOT of smart devices collecting a LOT of data and therefore requiring a LOT of analytics (and corresponding compute, storage and networking infrastructure). The traditional, centralized approach where all the data comes to a relatively small number of data centers where the data gets analyzed (“Big Data Analytics in the Cloud), will simply not scale and will also be very costly (e.g., transporting bits from here to there actually costs money). What is the alternative? Distribute the analytics! Consider, the smart devices have compute capabilities built into them (that’s why they’re smart); in fact, compute hardware is cheap – for literally a few dollars you can get significant computational power, which, in turn, can run analytics on the device itself. Furthermore, multiple devices are usually connected to a local hub/router/gateway where potentially more compute power is available (like Cisco’s IOx), enabling more complex multi-device analytics. Analytics are therefore running on or near the edge (of the network) and delivering Distributed Intelligence.

We can now think of a hierarchy of analytics corresponding to the available compute capabilities – “simple” analytics on the smart device itself, more complex multi-device analytics on the IoT gateways and finally the heavy lifting, the Big Data Analytics, running in the cloud. This distribution of analytics, offloads the network and the data centers creating a model that scales. Related terms have come up recently, such as Fog Computing and Cloudlets (related to mobile computing), also introducing the notion of a hierarchy of computing from the edge to the Cloud.

Edge Analytics is not only about creating a distributed, scalable model for IoT analytics – it has additional benefits:

  • Many business processes do not require “heavy duty” analytics and therefore the data collected, processed and analyzed by Edge Analytics on or near the edge can drive (automatic) “decisions” without ever needing to travel to a remote data center. For example, a local valve can be turned off when Edge Analytics have indicated a leak. This is a sort of “closing the circle” on the edge.
  • Some actions need to be taken in real time (i.e., they can not tolerate any latency between the sensor-registered event and the reaction to that event) – a situation that arises in industrial control systems – so there is literally no time to transmit the data to a remote Cloud.
  • Some sensors create a LOT of data – video being the primary one. Even when compressed, a typical video camera can produce a few megabits of data every second. Transporting these video streams requires bandwidth and bandwidth costs money and, if in addition, you want some quality of service guarantees it becomes even more expensive. Thus, performing video analytics on the edge and transporting only the “results” is much cheaper. If some of the video needs to be stored, depending on the volume, it could be stored near the edge or in a local (on premise) data center.

Since I am a firm believer in “No Free Lunch”, there has to be some price to pay…

So here is what we are trading off –

Ideally, you would want ALL of the data to process and analyze, as there may be relationships, patterns and correlations that could only be seen when using ALL of the data. Edge Analytics is all about processing and analyzing subsets of all the data collected and then only transmitting the results – we are essentially discarding some of the raw data and potentially missing some insights. The question is can we live with this “loss” and if so how should we choose which pieces we are willing to “discard”.

Before answering this question, let me point out that this topic is related to Big Data in general – given that sense making cannot keep up with the amount of data collected (the so called Big Data Gap), should we collect/store/analyze EVERYTHING? Or maybe we should be prudent and deal with Smart and Relevant Data. Edge Analytics is one way of making decisions about what we keep and analyze, but here too, the question is how do we decide what needs to be kept and analyzed? What is Relevant?

The answer to this question is highly domain dependent – some organizations (e.g., militaries) may not be willing to lose ANY data and for them Relevant Data is ALL data. This leads me to conclude that domain expertise is required to make to decision what can be handled at the edge with the associated loss of underlying data. Domain experts will be those who know a priori that some pieces of data from different locations are unlikely to be related and therefore can be dealt with locally and possibly acted on locally. Sure, you can always argue that there may be new, unknown, surprising correlations that even domain experts are unaware of and, yes, we may lose these insights. Hence, no free lunch

I should also mention that distributed systems have a long and rich history and we should keep in mind some of the lessons learned when dealing with such systems. For example, when many “islands” are analyzing and acting on the edge, it may be important to have somewhere a single “up-to-date view”, which in turn, may impose various constraints (for those interested in some of the more technical details and not faint of heart, see the CAP Theorem). The fact that many of the edge devices are also mobile complicates the situation even more. After all, most of us carry an edge analytics device on us (smart phone).

Here is the upshot of this discussion –

If you believe that the IoT will expand into every nook and cranny of our lives, where every conceivable object we interact with will become smart and connected, in the home, at work, in industry, indoors and outdoors, then distributing the analytics and the intelligence is inevitable. That’s a good thing – it will help us in dealing with Big Data and releasing bottlenecks in the networks and in the data centers. However, it will require new tools when developing analytic-rich IoT applications – what is the best way of allocating the analytics between the edge and the cloud? How do we keep data consistent? How do we handle mobile edge analytic devices traveling in a highly dynamic environment?

Some domains already employ solutions that are essentially doing some analytics on the edge (e.g., adaptive traffic signal control), but not on the massive scale that will result from the rapid expansion of the IoT. While some people may associate the term “Distributed Intelligence” with apocalyptic visions a la Skynet in “The Terminator” movies, I believe that some real cool and exciting stuff is ahead of us and (excuse the cliché) it will change our lives for the better. Read less

VISUAL DATA

Visual Data – still hard to analyze

Here is an interesting observation: Ask a child to describe what she sees around her and she will immediately tell you something like “I see a tall man talking to a woman in the driveway in front of a yellow house”. Read more..

Visual Data – still hard to analyze

Here is an interesting observation: Ask a child to describe what she sees around her and she will immediately tell you something like “I see a tall man talking to a woman in the driveway in front of a yellow house”. The same task is beyond current computer technology – specifically, feeding a “raw” video clip to a machine and getting back (reasonably quickly) a short textual description of what happens in the clip, is currently pretty much impossible. Images and video are rich sources of information consisting of many different objects (with different shapes and colors) with some relationship to each other, in some environment, possibly moving (in the case of video), etc. – there is a reason that a picture is worth a thousand words. Analyzing images and video to facilitate automatic insights and associated decisions is still incredibly difficult (even offline; doing it in real time is much harder). A further complication is the fact that most of the visual content we view is actually a 2D projection of the real (3D) world. Remarkably, humans are really good at these types of tasks, so one approach could be “Hey, let’s just copy the human visual system” – we’ll get back to this later.

So, what can we do in the area of Video Analytics or Video Content Analysis? – Actually, quite a bit (but nothing like you may have seen in some popular movies) and here are some examples (certainly not an exhaustive list):

  • Driven by security and surveillance use cases, many “suspicious” behaviors can be recognized automatically (i.e., with no human in the loop) such as an object that has been left behind, someone crossing a virtual line, people counting, loitering and many others (but probably no more than about 20). Similarly, in the vehicular traffic area, behaviors such as stopped vehicle, someone driving on the shoulder, etc., can be identified.
  • Some very specific objects can be recognized – faces, vehicles, license plates and probably a few more. Although some only under limited conditions – controlled lighting, controlled pose, minimal occlusions, etc.
  • Tracking of specific objects in the camera’s field of view (tracking across multiple cameras, even when there is overlap in successive cameras, is very difficult)

If your interest is in some specific items on this limited list – no problem, you can buy them from numerous vendors. However, if you are looking for a different behavior or a different object, you will need some computer vision people to develop a new analytic – the generic object recognizer or the generic “tell me if anything unusual happens in this area” do not exist yet.

But don’t despair – Machine Learning approaches are starting to appear in some commercial products. Basically, the machine is trained, for example, on video that represents normal vehicular traffic flow and once the learning phase is over, the machine can indicate that something abnormal has happened such as traffic slowdown due to some sort of incident further down the road. When I say “machine”, by the way, I mean the computer that ingests the video stream and runs the anomaly detection algorithm (which could, in principle, run in the camera itself or very near to it (see my previous post on Edge Analytics).

At this point, you are probably saying, “so what about copying the human visual system?” Well, it turns out the HVS is quite complex and we have not figured out how all of it works yet. A lot of progress has been made over the years and a lot of good research is going on (for example, work at MIT, Penn State and others). One of the exciting developments related to this area are associated with Deep Learning (which really deserves it’s own post…), which is a Machine Learning approach that does really well with tasks where humans are usually better than machines – for example, object recognition in images. DL usually requires a lot of computational resources, which is slowly evaporating as a real hurdle and as a result there have been some really exciting results! Google has recently shown an image with a caption that was caption that was created automatically – this is actually getting us closer to the target of automatic video summarization at the beginning of this post. Another one comes from Microsoft where they managed to outdo humans in an image classification task.

As an aside, some of the world’s top academics in the area of Deep Learning have joined Google, Facebook and Baidu in the last year or two – that should tell you something.

Let me also make the following point – humans are equipped with visual hardware (eyes) that can see in 3D. A lot of stuff gets easier when you also have depth information (e.g., which of two visible objects is in front and which is in back) and there actually are cameras that can record depth information – from stereoscopic cameras to Kinect-like sensors to time-of-flight cameras. In fact, if you have a number of cameras covering the same area from different vantage points you can do on the fly 3D reconstruction (check out FreeD for some real cool clips). Now you can run some advanced video analytics on real time 3D streams to do things that were simply not possible before (or required ridiculous amounts of computational power).

A final thought –

The estimated number of surveillance cameras in the world is about 210 million (obviously, not counting consumer cameras, smart phone cameras, etc.) producing an obscene amount stored video, most of which has never been viewed by anyone and most likely will never be viewed by anyone – there is just too much of it. Only advanced video analytics will be able “to watch the video for us”, letting us know when there is something interesting there.  Read less