Документ взят из кэша поисковой машины. Адрес оригинального документа : http://herba.msu.ru/shipunov/os/google/google.pdf
Дата изменения: Sun Dec 7 05:28:13 2014
Дата индексирования: Sun Apr 10 03:20:08 2016
Кодировка:
Google as a taxonomic engine

Alexey Shipunov


Background: tree
cow mouse rat


Background: Google Scholar


· · · · · · ·

Extract phyla names Obtain numbers of joint hits Do some magic Calculate similarity Make clusters Same, with "-ecology " Same, with classes names


Numbers of joint hits
· · · · · Phyla names from my "synat " classification R script to make command-line queries Links text browser to make textual output UNIX text tools (sed, grep) to clean results Comma-delimited file for import into R


Magic with numbers
· Some names appear much often than others
­ found numbers of individual hits ­ make weights ­ multiple numbers of joint hits by geometric means of individual hits from each taxon in pair

· Convert table of three column into square matrix · Convert similarities to dissimilarities


Similarity
· Calculate Euclidean distances · Ward's method hierarchical clustering · Tree of clusters is NOT a phylogenetic tree


Result

strange groups good groups questionable groups


Same, -ecology


Classes


Some reliable groups
Primitive animals

Mosses

Basal bacterial groups

Algae with primary chloroplasts
Ascomycete fungi


Questionable groups

Spiralians + starfishes

Algal mix

Two distantly related groups, both contain amoebae


Strange groups
Insects + flowering plants protists

Creatures with shells Vertebrates + Grampositive bacteria

Parasitic creatures, animals and protists


Conclusions
· It is working! · In most cases, only closest taxa were revealed; animals, bacteria and fungi were intermixed · "-ecology " did not help · Classes are generally better than phyla


"Best classification is a classification which does not exist, it is a constantly changing product of processing all reliable data available on-line"