Description:
Clustered heat maps give a visual representation of a data matrix. Clustering algorithms are used to position rows and columns closer together if they are more similar. The color of each cell represents its value on a color gradient. This visualization allows the investigator to find patterns from large data matrices, that otherwise would be difficult to detect.
Usually, a clustered heat map is made on variables that have similar scales, such as scores on tests or gene expression measurements. If the variables have different scales, the data matrix should first be scaled using a standardization transformation such as a z-score or proportion of the range.
NCSS allows one to select from eight possible hierarchical clustering algorithms. The clustering method selected for the columns need not be the same as the method selected for the rows. The Hierarchical Clustering chapter of the NCSS documentation gives some details of the algorithms, and they will not be discussed here.
As an example, suppose that while researching yeast infections, scientists wish to compare gene expression of 18 genes across 55 species of yeasts. The researchers are looking for yeast groups and gene groups that behave in a similar manner. The data set lists the yeast ID numbers down the first column, followed by 18 columns of gene expression values.
In the procedure, the gene expression columns are entered for the Cluster Variables. The Row Label Variable is Yeast ID. There is no additional scaling that needs to be done.
The Group Average clustering method will be used for Variables and Rows, and the Distance Method is set to Euclidean. The Max Clusters is set to 5 for both Variables and Rows. The Pros and Cons of each clustering method, and choosing the maximum number of clusters, is beyond the scope of this video. One can experiment with different values to see the effect.
The size of the labels of the rows is made smaller to avoid overlapping labels.
The clustered heat map shows the groupings of the genes and the yeast species.
The details shown in the Linkage Reports can be used to assess the quality or utility of the clustering results.