A Geometric Representation and Similarity Measure for Clustering Based Anomaly Detection in Industrial Automation Systems
Publication date
2020-12
Document type
PhD thesis (dissertation)
Author
Li, Peng
Advisor
Referee
Granting institution
Helmut-Schmidt-Universität / Universität der Bundeswehr Hamburg
Exam date
2020-12-08
Organisational unit
Part of the university bibliography
✅
DDC Class
000 Informatik, Wissen & Systeme
Keyword
Data-driven condition monitoring
Anomaly detection
Abstract
Rapidly changing custom demands and increasing global competition require manufacturing companies to produce various products in a more adaptive and efficient manner by using more intelligent machinery. However, the maintenance of such advanced machinery is one of the major costs for manufacturing companies especially for asset-heavy industries. The inefficiency of traditional maintenance strategies leads to the desire for data-driven methods for predictive maintenance.
As a classic unsupervised learning method, cluster analysis is able to group a given data set into different meaningful subsets without prior knowledge. It is easily adapted to different systems. But most clustering methods are based on the assumption of either the statistical distribution or the structure of given data. This assumption limits the applicability of clustering-based methods to solve problems like the condition monitoring and the predictive maintenance of complex industrial systems, which mostly have non-convex data.
The main goal of this work is to develop a more general representation and similarity measure for clustering-based anomaly detection methods, which should improve the accuracy of clustering-based anomaly detection methods and should be applicable on data sets with different structures (shapes): both convex and non-convex.
To achieve this, a non-convex hull-based representation and the corresponding similarity measure were introduced in this work. This solution makes no assumption on either the structure or the distribution of given data and therefore can be combined with any kind of clustering method. Furthermore, it can better reflect the natural structure of given data and thus has high generality.
In order to learn a non-convex hull-based representation of given clusters, two novel algorithms were developed to compute $n$-dimensional non-convex hulls of given data. Furthermore, a new algorithm to determine the position of a point to an $n$-dimensional non-convex hull was also proposed as similarity measure between a test point and the normal behavior represented by non-convex hulls.
The theoretical results in non-convex hull-based representation and the corresponding similarity measure were validated with four real world data sets, as well as two sets of artificial data. The obtained results show that the non-convex hull-based solution can significantly improve the accuracy of the cluster-based anomaly detection on non-convex data sets. Furthermore, the performance of the non-convex hull-based solution on convex data sets is as similar as the state-of-the-art cluster-based anomaly detection methods. Therefore, the experimental results testify to the high generality and applicability of the proposed approach in practice.
As a classic unsupervised learning method, cluster analysis is able to group a given data set into different meaningful subsets without prior knowledge. It is easily adapted to different systems. But most clustering methods are based on the assumption of either the statistical distribution or the structure of given data. This assumption limits the applicability of clustering-based methods to solve problems like the condition monitoring and the predictive maintenance of complex industrial systems, which mostly have non-convex data.
The main goal of this work is to develop a more general representation and similarity measure for clustering-based anomaly detection methods, which should improve the accuracy of clustering-based anomaly detection methods and should be applicable on data sets with different structures (shapes): both convex and non-convex.
To achieve this, a non-convex hull-based representation and the corresponding similarity measure were introduced in this work. This solution makes no assumption on either the structure or the distribution of given data and therefore can be combined with any kind of clustering method. Furthermore, it can better reflect the natural structure of given data and thus has high generality.
In order to learn a non-convex hull-based representation of given clusters, two novel algorithms were developed to compute $n$-dimensional non-convex hulls of given data. Furthermore, a new algorithm to determine the position of a point to an $n$-dimensional non-convex hull was also proposed as similarity measure between a test point and the normal behavior represented by non-convex hulls.
The theoretical results in non-convex hull-based representation and the corresponding similarity measure were validated with four real world data sets, as well as two sets of artificial data. The obtained results show that the non-convex hull-based solution can significantly improve the accuracy of the cluster-based anomaly detection on non-convex data sets. Furthermore, the performance of the non-convex hull-based solution on convex data sets is as similar as the state-of-the-art cluster-based anomaly detection methods. Therefore, the experimental results testify to the high generality and applicability of the proposed approach in practice.
Version
Not applicable (or unknown)
Access right on openHSU
Open access