Detect anomalies in data using Oracle Accelerated Data Science (ADS) πŸ§‘β€πŸ”¬

Some time ago I wrote about issues with the reliability of my Internet connection – The Joys of Unreliable Internet, one thing that came out of this was a script that I run every 30 minutes via a Cron job on my Raspberry Pi that checks the download speed of my internet connection and writes this to a CSV file πŸƒ.

My Internet has been super-stable since I wrote this script – typical eh! However, I have a lot of data collected so I thought that I’d attempt to detect anomalies in this data, for example does my Internet slow down on specific day/times 🐌.

Here is what the CSV file that I capture the data in looks like – I have a column for datetime and one for the download speed.

I noticed that the Oracle Accelerated Data Science (or ADS for short) Python module can detect anomalies in time series data so would be perfect to analyse my Internet speed data with.

The module can be installed using the following command:

python3 -m pip install "oracle_ads[anomaly]"

Once installed you can run the following to initialize a job.

ads operator init -t anomaly

This creates a folder within the current directory named anomaly with all of the files required to perform anomaly detection. I copied the CSV file with my Internet speed data into this folder (Speedtest.csv).

I then opened the anomaly.yaml file within this directory – this contains the configuration settings for the job.

I updated the template anomaly.yaml file as follows:

I did the following:

  • Specified the name of the datetime column (Date)
  • Specified the target column, which include the data points to be analysed (Speed)
  • Set the location of the file containing the data to analyse (Speedtest.csv)
  • I also specified the format of the datetime column (using standard Python notation) – full documentation on this can be found here.

I saved anomaly.yaml and then ran the following command to run the anomaly detection job:

ads operator run -f anomaly.yaml

Top tip – if you are running this on MacOS and receive an SSL error, you’ll likely need to run Install Certificates.command which can be found within the Python folder within Applications .

The job took a few seconds to run (I only had 200KB of data to analyse), it created a results folder within the anomaly folder, within this area two files – a report in HTML format and a CSV file containing details of all of the anomalies detected.

The report looks like this (the red dots are the anomalies detected).

Full details of all anomalies can be found in the outliers.csv file, this also contains a score (the higher the number, the worse).

This identified several days (along with the timeslots) that my Internet speed varied significantly from the average πŸ“‰.

I’ll probably run this again in a few months to see if I can spot any patterns such as specific days or timeslots that download speed varies from the norm.

Hope you all have as much fun as I did anomaly detecting! πŸ”Ž

Comments

Leave a comment