Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

About Clustering Models

A clustering model is an unsupervised learning algorithm that groups similar objects or similar attributes. For example, if you want to identify an operation in a production process, or attributes that have similar behaviour.

K-Means
Anchor
K-Means
K-Means

K-means clustering is a method which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. 

To launch this model tool, select Models > K-Means from the menu. Alternatively, click the corresponding icon in the sidebar.

Tip
titleType of variable

K-Means models can only be created with numerical attributes.


Tip
titleCreate attribute set
 It is possible to create a new attribute set using the input of the model. Click the icon .

Create a K-Means

The parameters for this method are defined as follows: 

  1. Enter Model name
  2. Enter Datasource from the list. 
  3. Select a Learning set from the list.
  4. Enter a Cluster name prefix. The default prefix is "CLUSTER".
  5. Enter a Cluster number, default 3. For more information, see Cluster number
  6. Enter a Maximum number of iterations. For more information, see Maximum number of iterations
  7. Select Cluster silhouette checkbox. For more information, see Cluster silhouette
  8. Select attribute(s) from the list for the Inputs.
  9. Click Save to generate the clusters. 

Subclu

Subclu is an unsupervised clustering algorithm used to define groups or patterns with the data based on the density of data points. It marks as outlier’s points that lie alone in low-density regions. Each cluster is expanded one dimension at a time into a dimension that is known to have a cluster that only differs from previous clusters in one dimension. Therefore, it is not necessary to define the number of clusters as in k-Means.

Create Subclu

The parameters for this method are defined as follows: 

  1. Enter Model name
  2. Select a Datasource from the list.
  3. Select a Learning set from the list.
  4. Select attribute(s) from the list for the Input.
  5. Enter Cluster name prefix. The default prefix is ("CLUSTER-"). 
  6. Enter Attribute suffix. The default suffix is ("SUBCLU-1 Cluster").  
  7. Enter Epsilon, default 0.1. For more information, see Epsilon
  8. Select Cluster silhouette: yes or no.  For more information, see Cluster silhouette.
  9. Enter a Minimum points, default 10. For more information, see Minimum points
  10. Enter  Min silhouetteDimensions.  For more information, see Min silhouetteDimensions
  11. Click Save to generate the clusters.  
Tip
titleVisualize K-Means and Subclu results

To visualize the K-Means and Subclu results use the scatter plot, choose the attributes for x and y axis and then put the condition as the Nearest CLUSTER-Name.

Hierarchical Clustering

Hierarchical clustering is a model that is viewed as a dendrogram. A dendrogram is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering. For more information, see Dendrograms.