Blog Post

Anomaly detection 3.0 from signatures to unsupervised learning

May 19, 2022
|
Giulio Galvan

Let’s say we want to check if you had a normal day by looking at your average heart-beat and by counting the number of steps in your day.

Suppose we were in the nineties (or in certain out-of-date companies). You’ll start enumerating all the possible strange things that could happen to you:

and come up with a number of rules or signatures coded in pascal or something that was on fashion:

Booooring!

But more importantly how can the poor programmer know what is normal for you! Heart-beat varies a lot with age, fitness status, etc. And how is he supposed to know how much you walk on a regular basis? And how to set the right threshold for each rule? Choosing +1 or -1 can make the difference between having a perfectly normal day and an anomalous one.

So a little bit down the road, machine learning (the savior from all evil) to the rescue and if-else’s are substituted with machine learning models for binary classification.

Now that is what you are looking for. ML has a fresh, data-driven approach and you don’t have to write all the if-else’s yourself. You just need data.

So how does ML work? Well, ML collects a lot data about your days and splash them on the Cartesian graph

Each point is one of your days. You have taken the patience to sort them out between normal days and anomalous days and now ML has to figure out automagically a way to distinguish them.

That’s what ML excels at. ML would separate data points something like this:

Cool right? Except… what if I did not have in my dataset the case when you broke your foot and had to stay home? Or when you had a heart att… no better not 🙂

That wouldn’t be detected unfortunately. That is because you are teaching your model to distinguish right from wrong instead of just teaching what is normal –for you.

Fortunately recent advances in education have finally reached ML too (yes we have given up on corporal punishments a few years ago). So welcome to the era of unsupervised learning.

So the idea is to encircle normal data points, enclose them in a ball (more mathematically speaking describing them with a probability distribution) so that all points outside are considered anomalous wherever they might fall in the Cartesian graph.

Now everything out of the ordinary will be detected by the model, which is tailored for you, and you don’t have to write code– All you need is data.

Download here our resources