About Clustering Models
A clustering model is an unsupervised learning algorithm that groups similar objects records or similar attributesvariables. For example, if you want to identify an operation in a production process, or attributes or variables that have similar behaviour.
K-Means
Anchor | ||||
---|---|---|---|---|
|
K-means clustering is a method which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean.
To launch this model tool, select Models > K-Means from the menu. Alternatively, click the corresponding icon in the sidebar.
Tip | ||
---|---|---|
| ||
K-Means models can only be created with numerical attributes variable. |
Tip | ||
---|---|---|
| ||
It is possible to create a new attribute new variable set using the input of the model. Click the icon . |
Create a K-Means
The parameters for this method are defined as follows:
- Enter Model name.
- Enter Datasource from the list (if applicable).
- Select a Learning set from the list.
- Enter a Cluster name prefix. The default prefix is "CLUSTER".
- Enter a Cluster number, default 3. For more information, see Cluster number.
- Enter a Maximum number of iterations. For more information, see Maximum number of iterations.
- Select Cluster silhouette checkbox. For more information, see Cluster silhouette.
- Select attribute variable(s) from the list for the Inputs.
- Click Save to generate the clusters.
Subclu
Subclu is an unsupervised clustering algorithm used to define groups or patterns with the data based on the density of data points. It marks as outlier’s points that lie alone in low-density regions. Each cluster is expanded one dimension at a time into a dimension that is known to have a cluster that only differs from previous clusters in one dimension. Therefore, it is not necessary to define the number of clusters as in k-Means.
Create Subclu
The parameters for this method are defined as follows:
- Enter Model name.
- Select a Datasource from the list (if applicable).
- Select a Learning set from the list.
- Select attribute variable(s) from the list for the Input.
- Enter Cluster name prefix. The default prefix is ("CLUSTER-").
- Enter Attribute suffix Variable suffix. The default suffix is ("SUBCLU-1 Cluster").
- Enter Epsilon, default 0.1. For more information, see Epsilon.
- Select Cluster silhouette: yes or no. For more information, see Cluster silhouette.
- Enter a Minimum points, default 10. For more information, see Minimum points.
- Enter Min Dimensions. For more information, see Min Dimensions.
- Click Save to generate the clusters.
Tip | ||
---|---|---|
| ||
To visualize the K-Means and Subclu results use the scatter plot, choose the attributes the variables for x and y axis and then put the condition as the Nearest CLUSTER-Name. |
Hierarchical Clustering
Hierarchical clustering is a model that is viewed as a dendrogram. A dendrogram is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering. For more information, see Dendrograms.