About Clustering Models

A clustering model is an unsupervised learning algorithm that groups similar objects or similar attributes. For example, if you want to identify an operation in a production process, or attributes that have similar behaviour.

K-Means

K-means clustering is a method which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean.

To launch this model tool, select Models > K-Means from the menu. Alternatively, click the corresponding icon in the sidebar.

Type of variable

K-Means models can only be created with numerical attributes.

Create attribute set

It is possible to create a new attribute set using the input of the model. Click the icon

.

Create a K-Means

The parameters for this method are defined as follows:

Enter Model name.
Enter Datasource from the list.
Select a Learning set from the list.
Enter a Cluster name prefix. The default prefix is "CLUSTER".
Enter a Cluster number, default 3. For more information, see Cluster number.
Enter a Maximum number of iterations. For more information, see Maximum number of iterations.
Select Calculate cluster silhouette checkbox. For more information, see Calculate cluster silhouette.
Select Optimize cluster number checkbox. For more information, see Search cluster number.
Select attribute(s) from the list for the Input.
Click Save to generate the clusters.

Subclu

Subclu is an unsupervised clustering algorithm used to define groups or patterns with the data based on the density of data points. It marks as outlier’s points that lie alone in low-density regions. Each cluster is expanded one dimension at a time into a dimension that is known to have a cluster that only differs from previous clusters in one dimension. Therefore, it is not necessary to define the number of clusters as in k-Means.

Create Subclu

The parameters for this method are defined as follows:

Enter Model name.
Select a Datasource from the list.
Select a Learning set from the list.
Select attribute(s) from the list for the Input.
Enter Cluster name prefix. The default prefix is ("CLUSTER-").
Enter Epsilon, default 0.1. For more information, see Epsilon.
Enter a Minimum points, default 10. For more information, see Minimum points.
Select Cluster silhouette: yes or no. For more information, see Cluster silhouette.
Click Save to generate the clusters.

Visualize K-Means and Subclu results

To visualize the K-Means and Subclu results use the scatter plot, choose the attributes for x and y axis and then put the condition as the Nearest CLUSTER-Name.

Hierarchical Clustering

Hierarchical clustering is a model that is viewed as a dendrogram. A dendrogram is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering. For more information, see Dendrograms.