Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Dendrograms

A dendrogram is the graphical representation of a statistical tool called “hierarchical agglomerative clustering”. Hierarchical clustering aims at defining a sequence of N clusterings of k clusters, for k Î [1,...,N], so that the resulting clusters form a nested sequence.

The agglomerative algorithm starts with the initial set of N attributes, considered as N singleton clusters. At each step it proceeds by identifying the two most similar clusters and merging them to form a new cluster. This step is repeated until all attributes have been merged together into a single cluster.

The similarity among the attributes is measured by means of the correlation coefficient which takes its values into the range [-1,1]:

rho(x,y) = cov(x,y) / σxy

where, cov(x,y) represents the covariance between variables X and Y; and σx is the standard-deviation of variable X.

Create a Dendrogram

To launch the dendrogram editor, select Visualize > Dendrogram from the menu. Alternatively, click the icon () in the sidebar and then add New.

  1. Enter a chart Title.

  2. Select an Object set from the list, if required. 

  3. Select attributes from the Attribute list and click Input. Attributes used in dendrograms must be numerical.

  4. Click on Save.

  5. The Dendrogram tool generates two different views:

    1. Dendrogram tree (see tab Dendrogram) shows groups of linearly correlated attributes and clusters highly correlated attributes together on the tree. The closer the value is to 1 or -1 the higher the correlation. The higher correlated values are displayed on the right.
    2. Correlation matrix give the overall results of calculating linear correlation factors, i.e. for each pair of attributes. Positive correlation factors are displayed in green, negative ones in red.

Tip
titleFind an Attribute

Attributes are listed alphabetically. To find an attribute, use the scroll bar or enter the name in the Attribute field.

To clone:

  1. Click More actions > Clone  to clone the dendrogram or More actions > Clone as and select Temporal Curves, Dendrogram, Summary Chart or Multiplot. 

To export data:

  1. Click More actions > General Actions 

    1. Click Download Data. Choose the CSV format (CSV US or CSV EU). 
    2. Click Export Matrix to CSV to download the correlation matrix. Choose the CSV format (CSV US or CSV EU). 

To export graphic as:

  1. Click More actions > Export graphic as and select a file format; either PDF, PNG or SVG.

To create a new attribute selection:

  1. In Correlation Matrix tab, use the check boxes to select the attributes, one by one or select all using the first checkbox (beside the empty field used to filter attributes). 

  2. Click on More Actions > Attribute Selection. It is possible to create: Attribute Set, Fill Missing Values, Differentiated Attribute, Moving Average, Shifted Attribute. 

To create different charts:

  1. In Correlation Matrix tab, use the check boxes to select the attributes, one by one or select all using the first checkbox (beside the empty field used to filter attributes). 

  2. Click on More Actions > Attribute Selection. It is possible to create: Histogram, Temporal Curves and a Dendrogram (using the new set of attributes). 


Info
titleWhen to use a dendrogram?

A dendrogram is an effective tool to use to analyze similarities among the attributes, and eliminating attributes that are too correlated (and thus bringing probably redundant information). It is also useful for detecting important correlations between an attribute of interest and the other attributes, for example, between a goal attribute and the input attributes.

Example Visualization

The following example illustrates the correlation of FUEL_WEEK_AVRG_MODEL with the energy gathered from solar panels. The minus sign (-) confirms that when there is abundant sunlight, fuel consumption is lower. The minimum correlation coefficient between SUN_ENERGY_WEEK_AVRG and SUN_WEEK_AVGR_HR and FUEL_WEEK_AVRG_MODEL is -0.502452.


Tip
titleHow to interpret the Dendrogram

Dendrogram shows groups of correlated attributes. This view is a graphical summary of the correlation matrix result. Note: the dendrogram shows absolute values of coefficient, so only values between 0 and 1. Strength of correlation 0 means no correlation and 1 means a perfect correlation (positive or negative).