Monitoring Models

About Monitoring Model

A monitoring model ISHM (Inductive  System Health Monitoring) is an unsupervised learning algorithm which enables the systems to detect anomalies, root-cause analysis based on Failure Mode and Effect Analysis (FMEA) in a process.

FMEA is a systematic method used for evaluating how a process will fail and the impact of different failure modes.

ISHM Analysis

The objective of ISHM is to provide a knowledge base cluster of related range values for the input parameters. Each cluster defines a range of allowable values for each parameter in a given input vector. Points that are inside the inner center of the cluster are considered to be within the system operating range, those further away can be considered as outliers.

In DATAmaestro Analytics there are two clustering algorithms implemented:

  • K-Means: is a method which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. This results in a partitioning of the data space based on distance to points in a specific subset of the plane. The number of clusters K has to be defined beforehand. The algorithm cannot be used with arbitrary distance functions or on non-numerical data.
  • Subclu: SUBCLU stands for density-connected Subspace Clustering. Subclu uses the concept of density-based algorithm DBSCAN (Density-Based Spatial Clustering of Applications with Noise). Given a set of points in some space, the algorithm groups into clusters points that are closely gathered together and points which lie alone in low-density regions are considered as outliers.
  • IMS: As IMS builds small clusters characterized by the min, max and center of the clusters  that become the min, max and center of the ISHM boxes. When a new data point is between the min and max values in each dimension, the point is considered inside the box and the distance is 0. If the data point is not within any box, it is associated to the nearest cluster based on the min and max boundaries.

Learning set empty

Certain models in DATAmaestro are able to handle missing values, while other models are not. For example, clustering methods used by ISHM, like K-means, are not able to handle missing values. If any row of data has a missing value, even for just one variable, the row will need to be ignored by the algorithm. “Learning set is empty” is the message to indicate that all rows have been removed due to one or more missing values per row by the algorithm. If you have any variables with a high number of missing values, it is recommended to remove them or to use the “Fill missing values” tool under the “Transform” menu in DATAmaestro Analytics.

Calculation time

In DATAmaestro, the calculation time can vary depending on the number of records, number of input variables and the type of algorithm that is being used. For ISHM, for example, Subclu is significantly slower than K-means for larger data sets. If you have a large dataset, it is recommend to use K-means.

To launch this model tool, select Models > ISHM from the menu.

Create an ISHM Analysis

The parameters for this method are defined on two tabs at the top of the page: Properties and Advanced tabs.

On the Properties tab: 

  1. Select a Datasource from the list (if applicable).
  2. Enter Model Name
  3. Select a Learning set from the list. In ISHM the learning set if a set of healthy mode operational observations. It maps the system operating range.Use a visualization tool to define the healthy record set. 
  4. Select a Testing set from the list. The test set should contain all types of records (healthy and non healthy). Once the system detects that the record is out of the healthy mapping clusters it calculated a distance and find which are the cause variables. 
  5. Select Model type,options Kmeans or Subclu.
  6. If Model Type Kmeans:
    1. Enter Number of Clusters, default value: 5.
    2. Enter Maximum number of iterations (Default value: 100). For more information, see Maximum number of iterations
  7. If Model Type Subclu:
    1. Enter Epsilon (Default value: 1). For more information, see Epsilon
    2. Enter Minimum number of points per cluster (Default value: 10). For more information, see Minimum number of points per cluster
    3. Enter Min cluster Dimensions, default value: 1. For more information, see Min cluster Dimensions.
    4. Enter Number of nearest points, default value: 1. For more information, see Number of nearest points.
  8. Enter a Variable prefix. 
  9. Enter a Distance variable name, default: ISHM-distance. 
  10. Enter Number of Causes, default value: 3. For more information, see Number of causes
  11. If Model Type IMS:
    1. Enter Epsilon (Default value: 0.1). Epsilon is the maximal distance for a point to be in a cluster. A larger value tends to lead to a lower number of clusters.
  12. Enter a Variable prefix. 
  13. Enter a Distance variable name, default: ISHM-distance. 
  14. Enter Number of Causes, default value: 3. For more information, see Number of causes
  15. Select an Variable Set, if required. 
  16. Select variable(s) from the list for the Input.
  17. Select variable(s) from the list for the Cond (as Condition).
  18. Select a variable from the list as an Index
  19. Click Save.

On the Advanced tab: 

  1. Enter a Conditional class count, default 3 based on: cause frequency, cause importance or both frequency and importance. 
  2. Select Temporal Units, for Trends, options: Excel time, Mac excel time, Unix time (ms) or Unix time (s). 
  3. Select a Cluster Standardisation
    1. Normalize: transform the variable to have a max of 1 and a min of 0. The value is calculated with: scaled(x) = (x - min)/(max - min) where the min and max values are based on the learning data set.
    2. Standardize: transform the variable to have a mean of 0 and a standard deviation of 1. The value is calculated with: scaled(x) = (x - µ)/(STDEV) where the average (µ) and standard deviation (STDEV) values are calculated on the learning data set.
  4. Select Keep Predict Output
    1. Keep all: it keeps all predicted output variables namely, ISHM-actual, ISHM-predict, ISHM-predict-high and ISHM-predict-low.
    2. Remove predict: it removes the output variables ISHM-predict, which is the average between ISHM-predict-high and ISHM-predict-low. 
    3. Remove predict and actual: it removes the output variables ISHM-predict and ISHM-actual, which is the input value of the variable.