Äîêóìåíò âçÿò èç êýøà ïîèñêîâîé ìàøèíû. Àäðåñ îðèãèíàëüíîãî äîêóìåíòà : http://hea-www.harvard.edu/AstroStat/Stat310_fMMV/jjs_20051011.pdf
Äàòà èçìåíåíèÿ: Thu Oct 13 00:09:33 2005
Äàòà èíäåêñèðîâàíèÿ: Tue Oct 2 04:28:24 2012
Êîäèðîâêà:

Ïîèñêîâûå ñëîâà: òóìàííîñòü ôåéåðâåðê
Jiashun Jin, Purdue University, 10/11/2005

1

'

$

Higher Criticism: Theory and Applications in Cosmology

Jiashun Jin Statistics Department Purdue Uinversity

&

%


Jiashun Jin, Purdue University, 10/11/2005

2

'

$

Collab orators Alphabetically: Nabila Aghanim Laura Cayon David Donoho Olivier Forni Jeau-Luc Starck Anna Treaster Universit´ Paris Sud e Purdue University Stanford University Universit´ Paris Sud e Service d'Astrophysique Purdue University

&

%


Jiashun Jin, Purdue University, 10/11/2005

3

'

$

Agenda

· Higher Criticism statistic · Optimal Adaptivity for detecting sparse Gaussian mixtures · Application to nonGaussian detection ­ detection of cosmic string ­ WMAP first year data

&

%


Jiashun Jin, Purdue University, 10/11/2005

4

'

$

Tukey's story

· Example: A young scientist administers 250 uncorrelated tests, out of which 11 were significant at the 5% level. · Question is: Is this surprising? · Answer: No, we expect 250 â 5% = 12.5 significance at 5% level.

&

%


Jiashun Jin, Purdue University, 10/11/2005

5

' Higher Criticism, Formalization prop osed by Tukey · Higher Criticism statistics: H C.05,n = n[(Fraction Significant at .05) - .05]/ .05 â .95 and typically, Reject H0 if and only if H C.
05,n

$

2

· Solution to previous example: H C.05,n = [11-12.5]/ 250 â .05 â .95 = -.43, = Accept H0 . · Higher Criticism, or "Second-Level Significance Testing," indicating Significance of Overall Bo dy of Tests. &

%


Jiashun Jin, Purdue University, 10/11/2005

6

'

$

Our Prop osal We propose H C = max0



<<1

n[(Fraction Significant at ) - ]/ (1 - )
,n

· Generalization of Tukey's H C

to allow selection of level

· Looking for unusually large number of "moderate significances"

&

%


Jiashun Jin, Purdue University, 10/11/2005

7

' Only need p-values to implement HC Obtain individual p-values by: pi = P {|N (0, 1)| |Xi |} · sort p-values: p · calculate ith z-score: HC · take maximum:
H Cn = max{ 1 i n} n,i (1)

$


(2)

... < p

( n)

=



n

i n

-p

( i)

p(i) (1 - p(i) )

HC

n,i

&

%


Jiashun Jin, Purdue University, 10/11/2005

8

'

$

Detection of Sparse Gaussian Mixture Hypothesis Testing: H0 : Xi N (0, 1), H
( n) 1 i.i.d

1 i n,
n

: Xi (1 -
n

i.i.d

n

)N (0, 1) +
n

N (µn , 1),

1 i n.

· Goal: testing

= 0 vs.

>0
( n) 1

· Approach: study for what ( n , µn ) H0 and H

are separable

&

%


Jiashun Jin, Purdue University, 10/11/2005

9

'

$

Subtlety of the Problem Calibrate with:
n

=n

-

,

0.5 < < 1, 0 < r < 1.

µn = Challenges: · Very sparse:
n

2r log n,

1 n

· Moderate significance: µn <



2 log n

· Different from continguity: µn increases with n &

%


Jiashun Jin, Purdue University, 10/11/2005

10

' Detection Boundary Theorem 1. (Ingster 1999, Jin 2004). If n = n µn = 2r log n, 1 < < 1, and 0 < r < 1, then: 2 If r > ( ), If r < ( ), H0 and H H0 and H
( n) 1 ( n) 1 -

$

,

separate asymptotically, merge asymptotically.

We call r = ( ) the "detection boundary": 1 3 - 1 , 2 2 < < 4, ( ) = 3 (1 - 1 - )2 , 4 < < 1.

&

%


Jiashun Jin, Purdue University, 10/11/2005

11

'

$

1

0.9

0.8

Classifiable

0.7

0.6

0.5

Detectable

0.4

0.3

0.2

0.1

Undetectable
0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

0 0.5

&

%


Jiashun Jin, Purdue University, 10/11/2005

12

'

$

Critical Value of Higher Criticism

· let h(n, ) be the critical value that P {H C > h(n, )} · h(n, ) 2 log log n, 0 < < 1

Call n 0 slow ly enough if : h(n, n ) 2 log log n 1, n .

&

%


Jiashun Jin, Purdue University, 10/11/2005

13

'

$

Optimal Adaptivity of Higher Criticism Theorem 2. (Donoho and Jin 2004). Consider the Higher Criticism test that rejects H0 when H C > h(n, n ) where the level n 0 slow ly enough. For every alternative ( n) H1 (r, ) where r exceeds the detection boundary ( ) -- so that the Likelihood ratio test would have full power -- Higher Criticism test also has full power: PH
(n) 1

{Reject H0 } 1.

&

%


Jiashun Jin, Purdue University, 10/11/2005

14

' Cosmic Microwave Background (CMB)

$

CMB: · Oldest light in the universe, a direct link to early universe · A relic of radiation when the universe 380, 000 years old · An almost perfect black body at a temperature 2.725 Kelvin

&

%


Jiashun Jin, Purdue University, 10/11/2005

15

'

$

Why study CMB CMB provides a direct link to very early universe: · Discriminate different models for early universe · how does it evolve into the large scale galaxies today

&

%


Jiashun Jin, Purdue University, 10/11/2005

16

' From 1965 to 2003

$

Figure 1: Small angular fluctuations in CMB are predicted as the imprints of initial densities perturbation which gave rise to large scale structures today. Red color: strong emission from the Milky way. &

%


Jiashun Jin, Purdue University, 10/11/2005

17

'

$

Wavelet Approach for nonGaussian Detection

· Standard inflation model predicts that the CMB is Gaussian · Other models or secondary effects have nonGaussian signatures · nonGaussian detection: disentangle different source of nonGaussianity from one to another · Wavelet transform is a powerful tool for detect nonGaussian signature ­ isotropic ` trous algorithm (Starck et al. 1998) a ­ bi-orthogonal wavelet transform

&

%


Jiashun Jin, Purdue University, 10/11/2005

18

'

$

For To day

· Consider n transform coefficients of CMB: · Test the hypothesis: H0 : X
iid i

X

i



N (0, 1),

1in

Goal. By comparing different statistics: · learn the strength and weakness of different tests · look for the optimal tests in idealized cases

&

%


Jiashun Jin, Purdue University, 10/11/2005

19

'

$

Wavelet Based nonGaussian Tests 1. Excess kurtosis (): (X1 , . . . , Xn ) =
i 4 Xi - 3

2. Maximum (Max): Max(X1 , . . . , Xn ) = max{|X1 |, |X2 |, . . . , |Xn |} 3. Higher Criticism (HC)

&

%


Jiashun Jin, Purdue University, 10/11/2005

20

'

$

Heuristic Comparison A test only sensitive to certain type of nonGaussianity: · : deviation of 4-th moment from Gaussian · Max: unusual behavior of very large observations · HC: ­ unusual behavior of extreme values ­ unusual behavior of moderately large values

&

%


Jiashun Jin, Purdue University, 10/11/2005

21

'

$

Application I: Detecting Cosmic String Cosmic string: · an important source of nonGaussianity in CMB · line-like ob ject · very old: formed within · very thin: 10
-22 1 100

second after Big Bang

m

· very heavy: 10 km weights the same as earth

&

%


Jiashun Jin, Purdue University, 10/11/2005

22

'

$

Why Lo ok For Cosmic String

· most potential candidate for forming modern galaxies · a direct link to very early universe · not yet detected · can not be produced in Lab (extremely high energy) Goal: develop most sensitive detection tools

&

%


Jiashun Jin, Purdue University, 10/11/2005

23

' Detecting nonGassian Convolution Comp onent

$

· given superposed image: 1 - · CMB + · CS, · test: = 0 & vs. >0

0

%


Jiashun Jin, Purdue University, 10/11/2005

24

'
350 300 250 200 600 150 100 50 0 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 400 200 0 -3 -2 -1 0 1 2 3 4 1200 1000 800

$

QQ Plot of Sample Data versus Standard Normal 1 Quantiles of Input Sample Quantiles of Input Sample 4 3 2 1 0 -1 -2 -3 -5 -4 -3

QQ Plot of Sample Data versus Standard Normal

0.5

0

-0.5

-1 -5

-4

-3

-2

-1 0 1 Standard Normal Quantiles

2

3

4

5

-2

-1 0 1 Standard Normal Quantiles

2

3

4

5

· equivalent to test: H 0 : X i = zi , ( n) H1 : Xi = 1 - · zi + · wi , ­ zi N (0, 1): wavelet coefficients of CMB ­ wi W : wavelet coefficients of CS ­ W unknown, but symmetric and heavy tail &
i.i.d i.i.d

1 i n, 1 i n.

%


Jiashun Jin, Purdue University, 10/11/2005

25

'

$

Calibrations Need careful calibrations for subtle analysis: · increasing amount of data are offset by increasingly challenges: = n = n
-r

,

0
· W : symmetric and has a p ower-law tail with index :
x

lim

x p{|W | > x} = C ,

C :

constant

Question: Fixed (r, ) and let n , what is the optimal test?

&

%


Jiashun Jin, Purdue University, 10/11/2005

26

'

$

1

0.9

0.8

0.7

0.6

Undetectable

0.5

r

0.4

0.3

Detectable for Max/HC Detectable for Kurtosis

0.2

0.1

Detectable for Kurtosis/Max/HC
2 4 6 8

0



10

12

14

16

18

&

%


Jiashun Jin, Purdue University, 10/11/2005

27

' Interpretation = 8 is the separating line: · E [W 8 ] < : Kurtosis is better ­ W has a relatively thin tail, nonGaussianity affects the bulk of the data ­ best tests: tests based on moments · E [W 8 ] = : HC/Max is better ­ W has relatively heavy tail ­ nonGaussianity has little effect on the bulk of data, but large effect on extreme values and moderately large values ­ best tests: tests based on data tails &

$

%


Jiashun Jin, Purdue University, 10/11/2005

28

'

$

Estimating

· Analysis supports the power-law tail assumption of W · A classical estimator for is the Hill's estimator (Ann. Statist. 1975) · Implementation of Hill's estimator: 6.1, ^ std() 0.9, ^

&

%


Jiashun Jin, Purdue University, 10/11/2005

29

'

$

Large n

· The finer resolution of the image, the larger the n · Need large n ­ to see the real advantage of H C ­ better answer whether < 8 or > 8 ­ better answer which of HC and Kurtosis is better

&

%


Jiashun Jin, Purdue University, 10/11/2005

30

'

$

Application I I: WMAP First Year Data

http://map.gsfc.nasa.gov/ · WMAP radiometers observe at 5 frequency bands with one or more receivers: K (1), Ka (1), Q (2), V (2), W (4). · WMAP team suggested use the weighted average of Q-V-W bands (8 receivers) · Foreground cleaned · Mask added: strong emission of Milky way etc.. · Downgraded from nside = 512 to nside = 256: measurement noise dominant in the smallest scale &

%


Jiashun Jin, Purdue University, 10/11/2005

31

'

$

Statistical Analysis

1. Generate 5, 000 simulated Gaussian maps of CMB. 2. For WMAP and each simulated map: · Use Spherical Mexican Hat Wavelets (SMHW): 2-D-spherical wavelets · Normalize the wavelet coefficients · Apply kurtosis, Max, and HC to wavelet coefficients

&

%


Jiashun Jin, Purdue University, 10/11/2005

-1

32
2 4 6 8 10 12 14

-1.5

'
6
2 1.5

-2

scale
Figure 2. Values of the K statistic for the analyzed WMAP data set (crosses). The bands outlined by dashed, dotted-dashed and solid lines correspond to the 68%, 95% and 99% confidence regions respectively.

$

6

5.5

1

5

0.5

4.5

0

Max
2 4 6 8 10 12 14

K

4

-0.5

3.5

-1

3

-1.5

2.5

igher Criticism Statistic: Detecting and Identifying Outliers in the WMA
0 5 10 15

-2

2

scale

scale

180 Figure 2. Values of the K statistic for the analyzed WMAP data setigureses).Vahees of thoutlinx statistical tedt for the hed lyzedsolid (cros F 180 3. Tlu bands e Maed by dashed, sotted-das ana and WMAP data set (crosses). The dashed, dotted-dashed and solid lines lines correspond to the 68%, 95% and 99% confidence regions respectivelspond to the 68%, 95% and 99% confidence levels respectively. corre y. 160 160

Bennett, C.L. et al. 2003, ApJS, 148, 1 astro-ph/0407271 Bennett, C.L. et.al., 2003b, ApJS, 148, 97 G´ ski, K.M.120 on, E. & Wandelt, B.D. 1999, in Proceedor , Hi v 120 5.5 ings of the MPA/ESO Cosmology Conference "Evolution of Cand´ , E.J. & Donoho, D.L. 1999, In Curve and Surface Fitting: es 100 t-Malo 1999, Nashville, TN. Editors: Cohen, A., Rabut, 100 Large-Scale Structure", eds. A.J. Banday, R.S. Sheth and L. Sain 5 Da Costa, PrintPartners Ipskamp, NL, pp. 37-42 (also astroC. & Schumaker, L.L.. Vanderbilt University Press PAGE??? 80 80 ph/9812350) Chiang, L.-Y., Naselsky, P.D., Verkhodanov, O.V. & Way, M.J. 4.5 Guth, A.H. 1981, Phys.Rev.D., 23, 347 2003, ApJ, 590, L65 60 60 Guth, A.H. & Pi, S.-Y. 1982, Phys.Rev.Lett., 49, 1110 Chiang, L.Y. & Naselsky, P.D. 2004, astro-ph/0407395 4 Hansen, F.K., 40anday, A.J. & G´ ski, K.M. 2004, astroB or Coles, P., Dineen, P., Earl, J. & Wright, D. 2004, MNRAS, 350, 40 ph/0404206 989 3.5 Hinshaw, G. et.al., 2003, ApJS, 148, 63 Cop20 .J., Huterer, D. & Starkman, G.D. 2004, Phys.Rev.D, 70, i, C 20 043515 Jarosik, N. et.al., 2003, ApJS, 148, 29 3 0 Cruz,00 , Mart´nez-Gonz´ 5 E., Vielva, P. & Cay´ L. 2004, 15 Jin, J. 2004, Institue of Mathematical Statistics Monograph, No. M. i alez, on, 10 0 5 10 astro-ph/0405341 45, 255 scale scale 2.5 Donoho, D. & Jin, J. 2004, Ann. Statist., 32, 962 Jin, J., Starck, J.-L., Donoho, D.L., Aghanim, N. & Forni, O. igure 4. V alues of the H C /H C + statisticalv, testsP.B., Banday, A.J. & analyzed submitted to EURASIP Journal on setd Signal Profor the G´ ski, 2004, WMAP data Applie (crosses). Eriksen, H.K., Noviko D.I., Lilje, or 2 nes corresp ond to the 68%, 95% and 99% confidence levels resp ectiviely al issue on "Applications of Signal Processing in . 0 5 10 15 cess ng, speci K.M. 2004a, astro-ph/0401276 scalesen, H.K., Hansen, F.K., Banday, A.J., G´rski, K.M. & Lilje, Astrophysics and Cosmology" Erik o Komatsu, E. et al. 2003, ApJS, 148, 119 P.B. 2004b, ApJ, 605, 14 180 180 Figure 3. Values of the Max statistical test for the analyzed WMAP data set (crosses). The dashed, dotted-dashed and solid lines Eriksen, H.K., Banday, A.J., G´ ski, K.M. & Lilje, P.B. 2004c, or Larson, D.L. & Wandelt, B.D. 2004, astro-ph/0404037 correspond to the 68%, 95% and 99% confidence levels respectively.

6

140

140

Max

HC+

HC

15

The

&

Bennett, C.L. et al. 2003, ApJS, 148, 1 Bennett, C.L. et.al., 2003b, ApJS, 148, 97 Cand´ , E.J. & Donoho, D.L. 1999, In Curve and Surface Fitting: es Saint-Malo 1999, Nashville, TN. Editors: Cohen, A., Rabut, C. & Schumaker, L.L.. Vanderbilt University Press PAGE??? Chiang, L.-Y., Naselsky, P.D., Verkhodanov, O.V. & Way, M.J. 2003, ApJ, 590, L65 Chiang, L.Y. & Naselsky, P.D. 2004, astro-ph/0407395 Coles, P., Dineen, P., Earl, J. & Wright, D. 2004, MNRAS, 350, 989 Copi, C.J., Huterer, D. & Starkman, G.D. 2004, Phys.Rev.D, 70, 043515 Cruz, M., Mart´nez-Gonz´ ez, E., Vielva, P. & Cay´ L. 2004, i al on,

Figure 2: Test scores on WMAP and 67%, 95%, and 99% confidence regions on 5, 000 simulated CMB maps.
160 160 140

c 2001 RAS, MNRAS 000, 1­??

astro-ph/0407271 G´ ski, K.M., Hivon, E. & Wandelt, B.D. 1999, in Proceedor 120 ings of the MPA/ESO Cosmology Conference "Evolution of Large-Scale Structure", eds. A.J. Banday, R.S. Sheth and L. 100 Da Costa, PrintPartners Ipskamp, NL, pp. 37-42 (also astroph/9812350) 80 Guth, A.H. 1981, Phys.Rev.D., 23, 347 60 Guth, A.H. & Pi, S.-Y. 1982, Phys.Rev.Lett., 49, 1110 Hansen, F.K., Banday, A.J. & G´ ski, K.M. 2004, astroor 40 ph/0404206 Hinshaw, G. et.al., 2003, ApJS, 148, 63 20 Jarosik, N. et.al., 2003, ApJS, 148, 29 Jin, J. 2004, Institue of Mathematical Statistics Monograph, No.

140

120

100

HC

HC

80

60

%

40

20


Jiashun Jin, Purdue University, 10/11/2005

33

'

$

Comparisons of Different Statistics

1. Almost equally powerful for detection, kurtosis is slightly better · Define empirical confidence of detection: #{test scores based on simulations score on WMAP} 5000 · Kurtosis: 99.7% · HC: 99.46% · Max: 99.44% 2. Higher Criticism: automatically identify a tiny portion data as the source of nonGaussianity &

%


Jiashun Jin, Purdue University, 10/11/2005

34

'
Identifying
180

$
Outliers

Source of nonGaussianity

in

the

WMAP

First

Y

160

140

120

· H Cn = max0<<1 H Cn, , H Cn, = n · [(Fraction Significant at ) - ]/ (1 - )

HC

+

100

80

60

40

20

10

15

0

0

5

10

15

the els

analyzed WMAP resp ectively .

· HC

n,

1 implies nonGaussiaintiy
data set

scale

(crosses).

The

dashed,

dotted

180

160

140

120

100

HC
80 60 40 20 0.8 1 0 0.97

0.98

0.99

1

xel)

i/ndata

(pixel)

xel of the analyzed WMAP data set. After sorting all the p ixel with the smallest p-v alue) to the numb er of data (pix t i/number of data. The region corresp onding to the extrem

Figure 3: Plot of H C WMAP at Scale 9

n,

versus (1 - ) for wavelet coefficients of

&

%


Jiashun Jin, Purdue University, 10/11/2005

35

'

$

Figure 4: The selected coefficients maps back to pixels in a ring centered at (209o , -57o ). We map each coefficient to only one pixel. This doesn't say only pixels over the ring is the source for nonGaussianity. &

%


Jiashun Jin, Purdue University, 10/11/2005

36

8 '
2
2

$
6
6

2
1

6
5

K

1

1

0

5

5

Max
15

4

0

0

Max

Max

K

K

-1

4

4

3

-1

-1

-2

3

3
10

5

2

0

5

10

15

-2

-2

5

5

10

10

15

scale 2
15

0

2

scale
0
5

5

10

10 15

15

scale scale 200
200

scale scale 200
200

200

150

200

150

150

HC

150

150

100

150

HC+
15

100

HC+

HC

100

100

100

HC+

HC

50

100

50

50

50

50

0

50
10

0

5

0

0

5

10

15

0

0

0

scale scale Figure 7. Values urtosis, and H C tests for the analyzed the scalegscale aftered from the Figure 5: Kof K , M ax Max, and HC after WMAP dataremov subtracting the pixels that were rin set from the predictions of the Standard Inflationary mo del (crosses). The bands outlined by dashed, dotted-dashed an 7. Valuesalues ,of K , M axH C tests for the analyzed WMAP data set after subtracting the pixels that were causingcausing the dev M ax and ure 7. V of Kto the 68%,.and Hde99% tion of nonGaussianity at .the dashed, 90%. that and solid lines corres subtracting dotted-dashed were the deviations confidence regions resp ectively for WMAP 95% andtests model (crosses). The bands outlined Ky dashed, dotted-dashed and solid lines correspond No C tec for the analyzed WMAP data set afterThe level the pixels predictions of the Standard Inflationary b the predictions of the Standard Inflationaryectively (crosses)., The bands outlined by dashed, dotted-dashed and solid lines corr and 99% confidence levels resp mo del for M ax H C and H C + . %, 95% and 99% confidence regions resp ectively for K . The dashed, dotted-dashed and solid lines corresp ond to the 68%, 95% he 68%, 95% and 99% confidence regions resp ectively for K . The dashed, dotted-dashed and solid lines corresp ond to the 68% confidence levels resp ectively for M ax, H C and H C + . 99% confidence levels resp ectively for M ax, H C and H C + . & %
McEwen, J.D., Hobson, M.P., Lasenby, A.N. & Mortlo ck, D.J. 2004, astro-ph/0406604 , J.D., Hobson, M.P., Lasenby, A.N. & Mortlo ck, D.J. wen, J.D., Hobson, M.P.,. Lasenby, A.N. & Mortlo ck, D.J. Mukherjee, P & Wang, Y. 2004, astro-ph/0402602

0

5

5

10

10

15

scale 0
15

0

0

scale
0
5

5

10

10

15

15


Jiashun Jin, Purdue University, 10/11/2005

37

'

$

Comparison to Other Works

· Some work has 99% confidence of nonGaussian detection, and some work identify the cold spot centered at (209o , -57o ). · Our contribution: ­ Add new statistics to nonGaussian detection: HC and Max ­ Almost equally powerful as kurtosis ­ HC offers automatical identification of a tiny portion of data as the source of nonGaussianity ­ The location of the ring coincide with the cold spot reported by Viela et al. 2004, Cruz et al. 2005 &

%


Jiashun Jin, Purdue University, 10/11/2005

38

'

$

Take Home Messages

· nonGaussian detection in CMB is an exciting filed · Higher Criticism is a promising new detection tool, adds more discussion to nonGaussian detection · better answer is expected in future study with a larger n

&

%