Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

About Clustering Models

A clustering model is an unsupervised learning algorithm that groups similar objects or similar attributes. For example, if you want to identify an operation in a production process, or attributes that have similar behaviour.

K-Means
Anchor
K-Means
K-Means

K-means clustering is a method which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. This results in a partitioning of the data space into Voronio cells.

To launch this model tool, select Models > K-Means from the menu. Alternatively, click the corresponding icon in the sidebar.

Tip
titleType of variable

K-Means models can only be created with numerical attributes.


Tip
titleCreate attribute set
 It is possible to create a new attribute set using the input of the model. Click the icon .

Create a K-Means

The parameters for this method are defined as follows: 

  1. Select a Data source from the list.
  2. Enter a name for your model. The default prefix is "CLUSTER-".
  3. Select a Learning set from the list.Select a Test set from the list.
  4. Select attribute(s) from the list for the Input.
  5. Enter a name for your model. The default prefix is "CLUSTER-".
  6. Enter a Cluster number, default 3
  7. Select Cluster silhouette
  8. Select Search Cluster number
  9. Click Compute to generate the clusterclusters
Tip
titleVisualize K-Means results

To visualize the K-Means results use the scatter plot, choose the attributes for x and y axis and then put the condition as the Nearest CLUSTER-Name.



Subclu

Subclu is an unsupervised clustering algorithm used to define groups or patterns with the data based on the density of data points. It marks as outlier’s points that lie alone in low-density regions. Each cluster is expanded one dimension at a time into a dimension that is known to have a cluster that only differs from previous clusters in one dimension. Therefore, it is not necessary to define the number of clusters as in k-Means.

Create Subclu

The parameters for this method are defined as follows: 

  1. Select a Data source from the list.
  2. Select a Learning set from the list.
  3. Select attribute(s) from the list for the Input.
  4. Enter Cluster name prefix. The default prefix is ("CLUSTER-"). 
  5. Enter a Maximum number of points, default 10. 
  6. Enter Epsilon, default 0.1. 
  7. Select Cluster silhouette.  yes or no.  
  8. Click Compute to generate the clusters.  

Hierarchical Clustering

Hierarchical clustering is a model that is viewed as a dendrogram. A dendrogram is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering. For more information, see Dendrograms.