Документ взят из кэша поисковой машины. Адрес оригинального документа : http://hea-www.harvard.edu/AstroStat/slog/groundtruth.info/AstroStat/slog/author/hlee/page/9/index.html
Дата изменения: Unknown
Дата индексирования: Sun Mar 2 23:38:01 2014
Кодировка: IBM-866

Поисковые слова: universe
The AstroStat Slog » hlee

Author Archive

[ArXiv] 1st week, May 2008

I think I have to review spatial statistics in astronomy, focusing on tessellation (void structure), point process (expanding 2 (3) point correlation function), and marked point process (spatial distribution of hardness ratios of X-ray distant sources, different types of galaxies -not only morphological differences but other marks such as absolute magnitudes and existence of particular features). When? Someday…

In addition to Bayesian methodologies, like this week’s astro-ph, studies on characterizing empirical spatial distributions of voids and galaxies frequently appear, which I believe can be enriched further with the ideas from stochastic geometry and spatial statistics. Click for what was appeared in arXiv this week. Continue reading ‘[ArXiv] 1st week, May 2008’ »

[ArXiv] 5th week, Apr. 2008

Since I learned Hubble’s tuning fork[1] for the first time, I wanted to do classification (semi-supervised learning seems more suitable) galaxies based on their features (colors and spectra), instead of labor intensive human eye classification. Ironically, at that time I didn’t know there is a field of computer science called machine learning nor statistics which do such studies. Upon switching to statistics with a hope of understanding statistical packages implemented in IRAF and IDL, and learning better the contents of Numerical Recipes and Bevington’s book, the ignorance was not the enemy, but the accessibility of data was. Continue reading ‘[ArXiv] 5th week, Apr. 2008’ »

  1. Wikipedia link: Hubble sequence[]

tests of fit for the Poisson distribution

Scheming arXiv:astro-ph abstracts almost an year never offered me an occasion that the fit of the Poisson distribution is tested in different ways, instead it is taken for granted by plugging data and (source) model into a (modified) χ2 function. If any doubts on the Poisson distribution occur, the following paper might be useful: Continue reading ‘tests of fit for the Poisson distribution’ »

[ArXiv] 4th week, Apr. 2008

The last paper in the list discusses MCMC for time series analysis, applied to sunspot data. There are six additional papers about statistics and data analysis from the week. Continue reading ‘[ArXiv] 4th week, Apr. 2008’ »

The LRT is worthless for …

One of the speakers from the google talk series exemplified model based clustering and mentioned the likelihood ratio test (LRT) for defining the number of clusters. Since I’ve seen the examples of ill-mannerly practiced LRTs from astronomical journals, like testing two clusters vs three, or a higher number of components, I could not resist indicating that the LRT is improperly used from his illustration. As a reply, the citation regarding the LRT was different from his plot and the test was carried out to test one component vs. two, which closely observes the regularity conditions. I was relieved not to find another example of the ill-used LRT. Continue reading ‘The LRT is worthless for …’ »

[ArXiv] Ripley’s K-function

Because of the extensive works by Prof. Peebles and many (observational) cosmologists (almost always I find Prof. Peeble’s book in cosmology literature), the 2 (or 3) point correlation function is much more dominant than any other mathematical and statistical methods to understand the structure of the universe. Unusually, this week finds an astro-ph paper written by a statistics professor addressing the K-function to explore the mystery of the universe.

[astro-ph:0804.3044] J.M. Loh
Estimating Third-Order Moments for an Absorber Catalog

Continue reading ‘[ArXiv] Ripley’s K-function’ »

[ArXiv] 3rd week, Apr. 2008

The dichotomy of outliers; detecting outliers to be discarded or to be investigated; statistics that is robust enough not to be influenced by outliers or sensitive enough to alert the anomaly in the data distribution. Although not related, one paper about outliers made me to dwell on what outliers are. This week topics are diverse. Continue reading ‘[ArXiv] 3rd week, Apr. 2008’ »

AstroGrid Desktop Suite

AstroGrid Desktop Suite is available. Check the AstroGrid website http://www.astrogrid.org for more informations. Continue reading ‘AstroGrid Desktop Suite’ »

PCA

Prof. Speed writes columns for IMS Bulletin and the April 2008 issue has Terence’s Stuff: PCA (p.9). Here are quotes with minor paraphrasing:

Although a quintessentially statistical notion, my impression is that PCA has always been more popular with non-statisticians. Of course we love to prove its optimality properties in our courses, and at one time the distribution theory of sample covariance matrices was heavily studied.

…but who could not feel suspicious when observing the explosive growth in the use of PCA in the biological and physical sciences and engineering, not to mention economics?…it became the analysis tool of choice of the hordes of former physicists, chemists and mathematicians who unwittingly found themselves having to be statisticians in the computer age.

My initial theory for its popularity was simply that they were in love with the prefix eigen-, and felt that anything involving it acquired the cachet of quantum mechanics, where, you will recall, everything important has that prefix.

He gave the following eigen-’s: eigengenes, eigenarrays, eigenexpression, eigenproteins, eigenprofiles, eigenpathways, eigenSNPs, eigenimages, eigenfaces, eigenpatterns, eigenresult, and even eigenGoogle.

How many miracles must one witness before becoming a convert?…Well, I’ve seen my three miracles of exploratory data analysis, examples where I found I had a problem, and could do something about it using PCA, so now I’m a believer.

No need to mention that astronomers explore data with PCA and utilize eigen- values and vectors to transform raw data into more interpretable ones.

Lomb-Scargle periodograms in bioinformatics

A statistical method developed by insightful and brilliant astronomers is used in bioinformatics:
Detecting periodic patterns in unevenly spaced gene expression time series using LombтАУScargle periodograms
by Glynn, Chen, & Mushegian [Click for R code and relevant information] [Paper archive at Bioinformatics]

The conclusion clearly indicates the winning points of the Lomb-Scargle periodograms.

The Lomb-Scargle periodogram algorithm is an effective tool for finding periodic gene expression profiles in microarray data, especially when data may be collected at arbitrary time points or when a significant proportion of data is missing.

My personal wish is that data driven statistical methods by hands on scientists (and their statistical collaborators) are to be used in other disciplines because I believe data sets are likely to share the unknown truth of our one universe.

[ArXiv] 2nd week, Apr. 2008

Markov chain Monte Carlo became the most frequent and stable statistical application in astronomy. It will be useful collecting tutorials from both professions. Continue reading ‘[ArXiv] 2nd week, Apr. 2008’ »