Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.atnf.csiro.au/research/workshops/2013/astroinformatics/talks/polsterer.pdf
Дата изменения: Fri Dec 13 08:55:22 2013
Дата индексирования: Sat Mar 1 09:39:00 2014
Кодировка:

Поисковые слова: volcano pele
Astroinformatics

Heidelberg Institute for Theoretical Studies

Solving Regression Problems with Machine Learning
lessons learned from learning machines
no w op wi tim th m od ize GP el d U s

December 12, 2013

Astroinformatics 2013 | Sydney 9-13 Dezember | Kai Lars Polsterer

1 / 13


Regression Problems in Astronomy
we want to determine parameters


metallicity, starburst ratio, redshift

... but detailed analysis is too expensive


observation time / telescope time
high spatial resolution high spectral resolution high time resolution

spatial problem
low sky coverage long integration time low sensitivity

analysis of large catalogs demands solving of regression problems

f ( ) y , where , y x x
December 12, 2013 Astroinformatics 2013 | Sydney 9-13 Dezember | Kai Lars Polsterer 2 / 13

n


Photometric Redshifts
galaxy and quasar redshifts


simple relation allow to understand early universe and find interesting objects SDSS data (DR7), broadband photometry (u, g, r, i, z)




10k degreeІ 0.3 billion objects 1.2M spectra

December 12, 2013

Astroinformatics 2013 | Sydney 9-13 Dezember | Kai Lars Polsterer

3 / 13


Nearest Neighbor Models
use k-Nearest Neighbors / local model



works fine in high dimensions (>3 but < 50) no physical assumptions required good reference samples available want to deal with missing values? change
2 missing

N k ( ) ! x


December 12, 2013

n


j=1

x j 2= x 12 + x 22 + ... + x with x
2 missing

+ ... + x n2 , x
2 j

=

1 n -1

j = 1. . n , j missing



Astroinformatics 2013 | Sydney 9-13 Dezember | Kai Lars Polsterer

4 / 13


Nearest Neighbor Models
applied to all quasars with spectra, SDSS DR7
z = z 1+ z = 0.033

norm

Polsterer et al. 2013
December 12, 2013 Astroinformatics 2013 | Sydney 9-13 Dezember | Kai Lars Polsterer 5 / 13


Existing Models

RMSE ( z norm ) =0.19 MAD ( z norm )= 0.041
December 12, 2013

Laurino et al. 2011

RMSE ( z norm ) =0.25 MAD ( z norm )= 0.048
6 / 13

Astroinformatics 2013 | Sydney 9-13 Dezember | Kai Lars Polsterer


Parallel Feature Selection
missing error values in model ... ... test different feature combinations!


building and testing one model = 100 sec.


huge dataset + bad python implementation

do it in parallel on a GPU


used openCL matrix update operation

improved reference sets
Gieseke et al. 2014
December 12, 2013 Astroinformatics 2013 | Sydney 9-13 Dezember | Kai Lars Polsterer 7 / 13


Complete Test
what are the best 4 features?


psf and model magnitudes in (u, g, r, i, z)


10 raw features + 45 colors = 55 features

n! , with n =55, r = 4 ( n- r ) ! r ! 341,055 combinations



395 days with old code

now just 3 hour, on 1 GPU

g g m ode l , u model - i r model , z psf -

psf psf

-r -z

model model

,

December 12, 2013

Astroinformatics 2013 | Sydney 9-13 Dezember | Kai Lars Polsterer

8 / 13


Forward Selection
can we be even better?


psf, model and petrosian magnitudes in (u, g, r, i, z) and errors extinction

585 features bes t 10 ou t of 585 1,197,308,441,345,108,200,000

egy! strat etter ab need we
December 12, 2013 Astroinformatics 2013 | Sydney 9-13 Dezember | Kai Lars Polsterer 9 / 13


Forward Selection
apply greedy forward selection

December 12, 2013

Astroinformatics 2013 | Sydney 9-13 Dezember | Kai Lars Polsterer

10 / 13


Forward Selection
resulting features:
dered ( z psf )- dered ( i petrosian dered ( g psf )- dered ( r model dered ( r psf )- dered ( z model dered ( r dered ( z
model

u psf - g petrosian ) z psf - i petrosian ) g psf - r model ) r psf - z model +r ) r model -i model i psf -i petrosia ) z psf - r petrosia g model - g petrosia
model





2 g

2

m o d el

)- dered ( i

model

n n n

psf

) -dered ( r

petrosian


December 12, 2013



2 g

petrosian

+

2 r

petrosian

Astroinformatics 2013 | Sydney 9-13 Dezember | Kai Lars Polsterer

11/13


Lessons Learned

new features

...
... optimized for machine learning

evaluate features comparable sets

...

... to optimize instrumentation, surveys to have competition

...

... publish data and method

different database access
December 12, 2013 Astroinformatics 2013 | Sydney 9-13 Dezember | Kai Lars Polsterer

...

... optimized for machine learning
12 / 13


thanks for a great week in Sydney

December 12, 2013

Astroinformatics 2013 | Sydney 9-13 Dezember | Kai Lars Polsterer

13 / 13