Getting Started

Simplifying Image Outlier Detection with Alibi Detect

Giannis Tolios
Towards Data Science
6 min readOct 13, 2020

--

Photo by Jessica Ruscello on Unsplash

Outlier noun

out·​li·​er | \ ˈau̇t-ˌlī(-ə)r \
1 : a statistical observation that is markedly different in value from the others of the sample
2 : a person or thing that is atypical within a particular group, class, or category

Outlier detection is the identification of dataset elements that vary significantly from the majority. Those elements are known as outliers, and there are various incentives for detecting them, depending on the domain of each case. A typical example is fraud detection, where outliers in a financial dataset may indicate fraudulent activity, such as transactions with stolen credit cards. Outlier detection may also be used in network intrusion detection, where the outliers are records of suspicious network activity, indicating possible attempts to gain unauthorized access. The aforementioned cases are examples of outlier detection being applied on tabular data, but it can also be used with other data types, such as images. Quality control in industrial manufacturing is a case where outlier detection is used to identify defects in product images.

There are three basic approaches to outlier detection. First of all, unsupervised methods don’t require any information about the properties of neither the outliers, nor the normal elements (known as inliers). On the other hand, semi-supervised methods need to be trained on a set of normal elements, and are able to detect those that differ significantly. Finally, supervised methods require a labeled dataset with both inliers and outliers, making them similar to classification algorithms, specifically suited for imbalanced classes. For a more detailed introduction to outlier detection in general, I suggest that you read this article.

Alibi Detect

Alibi Detect is an open source Python library for outlier, adversarial and drift detection, that includes a variety of powerful algorithms and techniques. It also supports various data types, such as tabular, time series and image. Here’s a list of all the outlier detection algorithms included with Alibi Detect, followed by a table indicating the possible uses for each one of them. More information about each algorithm, as well as the associated research papers, are included in the linked documentation pages.

Outlier Detection Algorithms

Suggested use for each algorithm included in the Alibi Detect library

Autoencoders

Autoencoders are a type of artificial neural network, comprised of an encoder and a decoder. The input is transformed to a latent space representation by the encoder, while the decoder receives that representation and outputs a reconstruction of the input data¹. There are various applications for autoencoders, such as dimensionality reduction and image denoising, but we are going to focus on image outlier detection for the scope of this article.

A simple autoencoder architecture — diagram from Comp Three Inc

In the case of image outlier detection, this type of neural network is known as a convolutional autoencoder, because the encoder and decoder parts consist of a convolutional neural network. The outlier detection autoencoder is trained on an image dataset, and is afterwards able to reconstruct similar images that are provided as input. If the difference between the input image and the reconstructed output is high, the image can be flagged as an outlier².

The MVTec AD Dataset

Testing the accuracy of an outlier detection model in real-world conditions can be challenging, as the number of dataset outliers are typically unknown to us. We can overcome this obstacle by training our models on a dataset that was specifically created for testing purposes.

Example images of the MVTec AD dataset — image provided by MVTec Software GmbH

The MVTec AD dataset contains thousands of high-resolution images, and is suitable for testing and benchmarking image outlier models, with a focus on industrial manufacturing quality control. The dataset is comprised of 15 categories of images, such as carpet, leather, transistor, screw etc. The training set for each category includes normal images only, while the test set has both normal images and outliers with various defects. For additional information about the dataset and the benchmarking results for various outlier detection algorithms, you can refer to the associated research paper³.

Outlier Detection with Alibi Detect

We are now going to create an image outlier detection model, based on the autoencoder algorithm of the Alibi Detect library. The model will be trained and tested on the capsule images of the MVTec AD dataset, following the semi-supervised approach, as the training set will be comprised of normal (inlier) images only.

We start by importing the necessary libraries, classes and functions. After that, we create a function that loads all the images from a given path, and converts them to a numpy array. We use that function to create the train and test sets for our model.

We define the encoder and decoder part of the convolutional autoencoder, by using the TensorFlow/Keras API . We then instantiate the OutlierAE detector class, which takes the encoder and decoder layers as input, and train the model on the appropriate set. We also need to define a threshold value, above which the element is flagged as an outlier. We calculate the threshold with the infer_threshold function, which takes the percentage of inlier values as a parameter. This is convenient, but not always possible to do in real-world conditions. After that, we detect the outliers of the test set, by using the predict function, which returns a dictionary with predictions for each element. The instance_score key contains the instance level score, with the element being flagged as an outlier in case it is above the threshold. Furthermore, the feature_score key contains the scores of each individual pixel of the image.

First, we copy all the images flagged as outliers to the img folder. Then, we create a pandas dataframe with the file names of all images, as well as the detector predictions. We create a second dataframe including only the outliers, and then print it. The model is fairly accurate as it has detected all the outlier images, and only flagged a few correct images as outliers (false positives).

Finally, we use the plot_feature_outlier_image function to plot the score for each pixel of the outlier elements. This helps us better understand how the outlier detector works. The first graph column contains five images that were flagged as outliers. Next, we can see each image as it was reconstructed by the outlier detector. Evidently, the model can only output a normal capsule image, thus failing to reconstruct the various deformations. The next three columns are the feature score visualizations for each image channel, and can help us locate the problematic areas.

Conclusion

Convolutional autoencoders are a viable and fairly accurate option for image outlier detection, but there is room for improvement. For example, you can try modifying the neural network architecture to get better results. You should also keep in mind that Alibi Detect includes other algorithms, such as the variational autoencoder and the autoencoding gaussian mixture model, that may be suitable for specific cases. I encourage you to experiment and find the best solution that fits your needs. Feel free to share your thoughts in the comments, or follow me on LinkedIn where I regularly post content about data science and other topics. You can also visit my personal website or check my latest book, titled Simplifying Machine Learning with PyCaret.

References

[1] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (2016), MIT Press

[2] R. Chalapathy, S. Chawla, Deep Learning for Anomaly Detection: A Survey (2019), arXiv:1901.03407

[3] P. Bergmann, M. Fauser, D. Sattlegger, C. Steger, MVTec AD — A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

--

--