Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.sao.ru/precise/Midas_doc/doc/95NOV/vol2/node214.html
Дата изменения: Fri Feb 23 12:57:39 1996
Дата индексирования: Tue Oct 2 17:53:24 2012
Кодировка:
Поисковые слова: phoenix

Cluster Analysis

Next: Discriminant Analysis Up: Multivariate Analysis Methods Previous: Principal Components Analysis

Cluster Analysis

The routines implemented are CLUSTER which has 8 options for hierarchical clustering and PARTITION which carries out non--hierarchical clustering. We will look at the hierarchical options available first.

The automatic classification of the n row--objects of an n by m table generally produces output in one of two forms: the assignments to clusters found for the n objects; or a series of clusterings of the n objects, from the initial situation when each object may be considered a singleton cluster to the other extreme when all objects belong to one cluster. The former is non--hierarchical clustering or partitioning.

The latter is hierarchical clustering. Brief consideration will show that a sequence of n-1 agglomerations are needed to successively merge the two closest objects and/or clusters at each stage, so that we have a set of n (singleton) clusters, n-1 clusters, ..., 2 clusters, 1 cluster. This is usually represented by a hierarchic tree or a dendrogram, and a ``slice'' through the dendrogram defines a partition of the objects. Unfortunately, no rigid guideline can be indicated for deriving such a partition from a dendrogram except that large increases in cluster criterion values (which scale the dendrogram) can indicate a partition of interest.

In carrying out the sequence of agglomerations, various criteria are feasible for defining the newly--constituted cluster:

The Minimal Spanning Tree, which is closely related to the single link method, has been used in such applications as interferogram analysis and in galaxy clustering studies. It is useful as a detector of outlying data points (i.e. anomalous objects).

Routine PARTITION operates in one two options. For both, a partition of minimum variance, given the number of clusters, is sought. Two iterative refinement algorithms (minimum distance or the exchange method) constitute the options available.

Rein Warmels
Mon Jan 22 15:08:15 MET 1996