Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 13 Next »


 

Dendrograms

A dendrogram is the graphical representation of a statistical tool called “hierarchical agglomerative clustering”. Hierarchical clustering aims at defining a sequence of N clusterings of k clusters, for k Î [1,...,N], so that the resulting clusters form a nested sequence.

The agglomerative algorithm starts with the initial set of N attributes, considered as N singleton clusters. At each step it proceeds by identifying the two most similar clusters and merging them to form a new cluster. This step is repeated until all attributes have been merged together into a single cluster.

The similarity among the attributes is measured by means of the correlation coefficient which takes its values into the range [-1,1]:

rho(x,y) = cov(x,y) / σxy

where, cov(x,y) represents the covariance between variables X and Y; and σx is the standard-deviation of variable X.

Create a Dendrogram

To launch the dendrogram editor, select Explore > Dendrogram from the menu. Alternatively, click the icon () in the sidebar and then Add new.... The parameters for a dendrogram are defined on two tabs at the top of the page: Data and Properties.

On the Data tab: 

  1. Select a Data source from the list.
  2. Select an Object set from the list.

  3. Select attributes from the Attribute list and click Input. Attributes used in dendrograms must be numerical.

Find an Attribute

Attributes are listed alphabetically. To find an attribute, use the scroll bar or enter the name in the search field .

On the Properties tab:

  1. Enter a chart Title and select the check box if you want it to show. 
  2. Click Compute to load the data. The results are displayed in two tabs: Dendrogram and Correlation Matrix.

Control the View

To change the view, use the control menu below the dendrogram to modify the zoom. To modify the dendrogram, click Edit and revise its parameters.

To export the chart:

  1. Click More actions and select a file format; either PDF, PNG or SVG.
  2. Enter a file name and path, and click Export.

When to use a dendrogram?

A dendrogram is an effective tool to use to analyze similarities among the attributes, and eliminating attributes that are too correlated (and thus bringing probably redundant information). It is also useful for detecting important correlations between an attribute of interest and the other attributes, for example, between a goal attribute and the input attributes.

Example Visualization

The following example illustrates the correlation of FUEL_WEEK_AVRG_MODEL with the energy gathered from solar panels. The minus sign (-) confirms that when there is abundant sunlight, fuel consumption is lower. The minimum correlation coefficient between SUN_ENERGY_WEEK_AVRG and SUN_WEEK_AVGR_HR and FUEL_WEEK_AVRG_MODEL is -0.502452.

 

 

  • No labels