[ArXiv] SDSS DR6, July 23, 2007
From arxiv/astro-ph:0707.3413
The Sixth Data Release of the Sloan Digital Sky Survey by … many people …
The sixth data release of the Sloan Digital Sky Survey (SDSS DR6) is available at http://www.sdss.org/dr6. Additionally, Catalog Archive Service (CAS) and
SQL interface to access the catalog would be useful to data searching statisticians. Simple SQL commends, which are well documented, could narrow down the size of data and the spatial coverage.
Part of my dissertation was about creating nonparametric multivariate analysis tools with convex hull peeling and I used SDSS DR4 to apply those convex hull peeling tools to explore celestial objects in the multidimensional color space without projections (dimension reduction). SDSS CAS might fulfill the needs of those who are looking for data sets to conduct
- massive multivariate data analysis,
- streaming data analysis (strictly, SDSS is not streaming but the data base is updated yearly by adding new observations and depending on memory, streaming data analysis can be easily simulated) and
- application of his/her new machine learning and statistical multivariate analysis tools for new discoveries.
Particularly, thanks to whole northern hemisphere survey, interesting spatial statistics can be developed such as voronoi tessellation for spatial density estimation. It also provides a vast image reservoir as well as the catalog of massive multivariate spatial data.
Oh, by the way, the paper discusses changes and improvement in the recent data release. The SDSS DR6 includes the complete imaging of the Northern Galactic Cap and contains images and parameters of 287 million objects over 9583 deg^2, and 1.27 million spectra over 7425 deg^2. The photometric calibration has improved with uncertainties of 1% in g,r,i and 2% in u, significantly better than previous data releases. The method of spectrophotometric calibration has changed and resulted 0.35 mags brighter in the spectrophotometric scale. Two independent codes for spectral classifications and redshifts are available as well.
hlee:
I’m at Salt Lake City for Joint Statistical Meeting (JSM). By accident (I wanted to go Don Rubin’s Causal Inference talk at the same time), I was listening a speaker whose work is motivated by an astronomer, interested in regression and clustering on SDSS data. Sadly, he only applied well known classical statistics on simulated bivariate data. In astronomy, I personally believe that the behavior of simulated data and the actual data is quite different, partly because the uncertainty comes during the calibration procedure. This uncertainty is hard to be modeled from a simple probabilistic theory. Another challenge is the computational time of those methods that the speaker introduced. The model based clustering or the k-mean requires iterative computation. With hundreds of millions objects, I become suspicious about their feasibility.
07-31-2007, 3:16 pm