Anomaly Detection in Time Collection Information


Anomaly detection is the method of figuring out information factors or patterns in a dataset that deviate considerably from the norm. A time sequence is a group of information factors gathered over a while. Anomaly detection in time sequence information could also be useful in numerous industries, together with manufacturing, healthcare, and finance. Anomaly detection in time sequence information could also be completed utilizing unsupervised studying approaches like clustering, PCA (Principal Element Evaluation), and autoencoders.

What’s an Anomaly Detection Algorithm?

Anomaly detection is the method of figuring out information factors that deviate from the anticipated patterns in a dataset. Many purposes, together with fraud detection, intrusion detection, and failure detection, usually use anomaly detection methods. Discovering unusual or very rare occasions that might level to a doable hazard, subject, or alternative is the intention of anomaly detection.

The autoencoder algorithm is an unsupervised deep studying algorithm that can be utilized for anomaly detection in time sequence information. The autoencoder is a neural community that learns to reconstruct its enter information By first compressing enter information right into a lower-dimensional illustration after which extending it again to its authentic dimensions. An autoencoder could also be educated on typical time sequence information to study a compressed model of the info for anomaly identification. The anomaly rating could then be calculated utilizing the reconstruction error between the unique and reconstructed information. Anomalies are information factors with appreciable reconstruction errors.

Time Collection Information and Anamoly Detection

Within the case of time sequence information, anomaly detection algorithms are particularly vital since they assist us spot odd patterns within the information that might not be apparent from simply trying on the uncooked information. Anomalies in time sequence information would possibly seem as abrupt will increase or lower in values, odd patterns, or surprising seasonality. Time sequence information is a group of observations throughout time.

  • Time sequence information could also be used to show anomaly detection algorithms, such because the autoencoder, the way to characterize typical patterns. These algorithms can then make the most of this illustration to seek out anomalies. The method can study a compressed model of the info by coaching an autoencoder on common time sequence information. The anomaly rating could then be calculated utilizing the reconstruction error between the unique and reconstructed information. Anomalies are information factors with appreciable reconstruction errors.
  • Anomaly detection algorithms could also be utilized to time sequence information to seek out odd patterns that might level to a hazard, subject, or alternative. As an example, within the context of predictive upkeep, a time sequence anomaly could level to a potential tools failure which may be fastened earlier than it leads to a considerable amount of downtime or security considerations. Anomalies in time sequence information could reveal market actions or patterns in monetary forecasts which may be capitalized on.

The explanation for getting precision, recall, and F1 rating of 1.0 is that the “ambient_temperature_system_failure.csv” dataset from the NAB repository comprises anomalies. If we had gotten precision, recall, and F1 rating of 0.0, then meaning the “ambient_temperature_system_failure.csv” dataset from the NAB repository doesn’t comprise anomalies.

Importing Libraries and Dataset

Python libraries make it very straightforward for us to deal with the info and carry out typical and sophisticated duties with a single line of code.

  • Pandas – This library helps to load the info body in a 2D array format and has a number of features to carry out evaluation duties in a single go.
  • Numpy – Numpy arrays are very quick and may carry out giant computations in a really quick time.
  • Matplotlib/Seaborn – This library is used to attract visualizations.
  • Sklearn – This module comprises a number of libraries having pre-implemented features to carry out duties from information preprocessing to mannequin improvement and analysis.
  • TensorFlow – That is an open-source library that’s used for Machine Studying and Synthetic intelligence and offers a variety of features to attain advanced functionalities with single traces of code.

Python3

import pandas as pd

import tensorflow as tf

from keras.layers import Enter, Dense

from keras.fashions import Mannequin

from sklearn.metrics import precision_recall_fscore_support

import matplotlib.pyplot as plt

On this step, we import the libraries required for the implementation of the anomaly detection algorithm utilizing an autoencoder. We import pandas for studying and manipulating the dataset, TensorFlow and Keras for constructing the autoencoder mannequin, and scikit-learn for calculating the precision, recall, and F1 rating.

Python3

information = pd.read_csv(

    '/NAB/grasp/information/realKnownCause/ambient'

    '_temperature_system_failure.csv')

  

data_values = information.drop('timestamp',

                        axis=1).values

  

data_values = data_values.astype('float32')

  

data_converted = pd.DataFrame(data_values,

                              columns=information.columns[1:])

  

data_converted.insert(0, 'timestamp',

                      information['timestamp'])

We load a dataset referred to as “ambient_temperature_system_failure.csv” from the Numenta Anomaly Benchmark (NAB) dataset, which comprises time-series information of ambient temperature readings from a system that skilled a failure. 

The panda’s library is used to learn the CSV file from a distant location on GitHub and retailer it in a variable referred to as “information”.

  • Now, the code drops the “timestamp” column from the “information” variable, since it’s not wanted for information evaluation functions. The remaining columns are saved in a variable referred to as “data_values”.
  • Then, the “data_values” are transformed to the “float32” information kind to cut back reminiscence utilization, and a brand new pandas DataFrame referred to as “data_converted” is created with the transformed information. The columns of “data_converted” are labeled with the unique column names from “information”, apart from the “timestamp” column that was beforehand dropped.
  • Lastly, the code provides the “timestamp” column again to “data_converted” originally utilizing the “insert()” technique. The ensuing DataFrame “data_converted” has the identical information as “information” however with out the pointless “timestamp” column, and the info is in a format that can be utilized for evaluation and visualization.

Python3

data_converted = data_converted.dropna()

We take away any lacking or NaN values from the dataset.

Anomaly Detection utilizing Autoencoder

It’s a kind of neural community that learns to compress after which reconstruct the unique information, permitting it to determine anomalies within the information.

Python3

data_tensor = tf.convert_to_tensor(data_converted.drop(

    'timestamp', axis=1).values, dtype=tf.float32)

  

input_dim = data_converted.form[1] - 1

encoding_dim = 10

  

input_layer = Enter(form=(input_dim,))

encoder = Dense(encoding_dim, activation='relu')(input_layer)

decoder = Dense(input_dim, activation='relu')(encoder)

autoencoder = Mannequin(inputs=input_layer, outputs=decoder)

  

autoencoder.compile(optimizer='adam', loss='mse')

autoencoder.match(data_tensor, data_tensor, epochs=50,

                batch_size=32, shuffle=True)

  

reconstructions = autoencoder.predict(data_tensor)

mse = tf.reduce_mean(tf.sq.(data_tensor - reconstructions),

                     axis=1)

anomaly_scores = pd.Collection(mse.numpy(), title='anomaly_scores')

anomaly_scores.index = data_converted.index

We outline the autoencoder mannequin and match it to the cleaned information. The autoencoder is used to determine any deviations from the common patterns within the information which might be realized from the info. To scale back the imply squared error between the enter and the output, the mannequin is educated. The reconstruction error for every information level is decided utilizing the educated mannequin and is utilized as an anomaly rating.

Python3

threshold = anomaly_scores.quantile(0.99)

anomalous = anomaly_scores > threshold

binary_labels = anomalous.astype(int)

precision, recall,

    f1_score, _ = precision_recall_fscore_support(

        binary_labels, anomalous, common='binary')

Right here, we outline an anomaly detection threshold and assess the mannequin’s effectiveness utilizing precision, recall, and F1 rating. Recall is the ratio of true positives to all actual positives, whereas precision is the ratio of real positives to all projected positives. The harmonic imply of recall and accuracy is the F1 rating.

Python3

check = data_converted['value'].values

predictions = anomaly_scores.values

  

print("Precision: ", precision)

print("Recall: ", recall)

print("F1 Rating: ", f1_score)

Output:

Precision:  1.0
Recall:  1.0
F1 Rating:  1.0

Visualizing the Anomaly

Now let’s plot the anomalies that are predicted by the mannequin and get a really feel for whether or not the predictions made are right or not by plotting the anomalous examples with pink marks with the entire information.

Python3

plt.determine(figsize=(16, 8))

plt.plot(data_converted['timestamp'],

         data_converted['value'])

plt.plot(data_converted['timestamp'][anomalous],

         data_converted['value'][anomalous], 'ro')

plt.title('Anomaly Detection')

plt.xlabel('Time')

plt.ylabel('Worth')

plt.present()

Output:

Anomaly represented with red dots on time series data

Anomaly represented with pink dots on time sequence information

Final Up to date :
09 Jun, 2023

Like Article

Save Article

Leave a Reply

Your email address will not be published. Required fields are marked *