Документ взят из кэша поисковой машины. Адрес оригинального документа : http://brain.bio.msu.ru/papers/Leonowicz_Karvanen_Shishkin_2005_JNeuroSciMeth_TrimmedAveraging_draft.pdf
Дата изменения: Thu Jul 15 15:18:30 2004
Дата индексирования: Mon Oct 1 20:22:39 2012
Кодировка:

Trimmed estimators for robust averaging of eventrelated potentials

Zbigniew Leonowicz 1, Juha Karvanen and Sergei L. Shishkin

2

Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Saitama 351-0198, Japan

Abstract

Averaging (in statistical terms, estimation of the lo cation of data) is one of the most commonly used pro cedures in neuroscience and the basic pro cedure for obtaining event-related p otentials (ERP). Only the arithmetic mean is routinely used in the current practice of ERP research, though its sensitivity to outliers is wellknown. Weighted averaging is sometimes used as a more robust pro cedure, however, it can b e not sufficiently appropriate when the signal is nonstationary within a trial. Trimmed estimators provide an alternative way to average data. In this pap er, a numb er of such lo cation estimators (trimmed mean, Winsorized mean and recently intro duced trimmed Lmean) are reviewed, as well as arithmetic mean and median. A new robust lo cation estimator tanh, which allows the datadep endent optimization, is prop osed for averaging of small numb er of trials. The p ossibilities to improve signal-to-noise ratio (SNR) of averaged waveforms using trimmed lo cation estimators are demonstrated for ep o chs randomly drawn from a set of real auditory evoked p otential data. Key words: averaging, eventrelated p otentials, evoked p otentials, mean, median, trimmed mean, robust estimators of lo cation, trimmed estimators

Preprint submitted to Journal of Neuroscience Metho ds

15 July 2004

1

Intro duction

Averaging is probably the most common basic statistical pro cedure in exp erimental science. It is used for estimating the lo cation of data (or "central tendency") in the presence of random variations among the observations, which can b e removed by this pro cedure. Data variations can b e the result of variations in the phenomenon of interest or of some unavoidable measuring errors. In signal pro cessing terms, this can b e considered as contamination of useful "signal", such as eventrelated brain activity, by useless "noise", such as artifacts and ongoing activity, b oth not rep eatedly asso ciated with the event. In the case of linear summation of signal and noise ("additive mo del"), the fact that only the eventrelated signal is timelo cked to the event and noise is not timelo cked, allows the cancellation of the noise by averaging the data separately for each time p oint relative to the event. Averaging is typically done using arithmetic mean, which is the most widely known estimator of the lo cation of the data. Eventrelated averaging is imp ortant for various techniques from single neuron firing recording to optical imaging, but is most essential for the old and
Corresp onding author. Address: Lab oratory for Advanced Brain Signal Pro cessing, RIKEN Brain Science Institute, 21 Hirosawa, Wakoshi, Saitama 351-0198, Japan. Phone: +81484679765 Fax: +81484679694 Email addresses: Zbigniew.Leonowicz@pwr.wroc.pl (Zbigniew Leonowicz), juha.karvanen@hut.fi (Juha Karvanen), shishkin@brain.riken.jp (Sergei L. Shishkin). URL: http://www.bsp.brain.riken.jp. 1 on leave from Wro claw University of Technology, Wro claw, Poland 2 Dept. of Human and Animal Physiology, Faculty of Biology, M. V. Lomonosov Moscow State University, Russia

2

still imp ortant technique of eventrelated p otentials (ERP). ERPs are voltage fluctuations rep eatedly asso ciated in time with some physical or mental events, which can b e extracted from the ongoing electro encephalogram (EEG) using signal averaging (Picton et al., 2000). The term "evoked p otentials" (EPs) is used for an imp ortant subset of ERPs which are "evoked" by a certain event, usually a sensory one, but not by indep endent endogenous pro cesses. Additive mo del is most commonly accepted for event-related activity and ongoing EEG, allowing the use of averaging for the extraction of ERP. Though a numb er of studies cast doubts on the additive mo del by demonstrating that eventrelated mo difications of the ongoing EEG may also contribute to ERP (Makeig et al., 2002; Jansen et al., 2003), averaging has b een proved to b e practically a very efficient pro cedure and is used, therefore, as the basic procedure in ERP analysis (Picton et al., 1995; Lop es da Silva, 1999; Picton et al., 2000).

Temp oral resolution in the order of milliseconds makes ERPs an imp ortant to ol for estimating of the timing for information pro cessing in the human brain (Picton et al., 2000). Thus, the improvement of signaltonoise ratio by averaging, which is the essence of computing the ERP, should receive a serious attention. In particular, an imp ortant task is the development of efficient metho ds for averaging of small sets of singletrial ERPs, where data may strongly deviate from Gaussian distribution. Due to deviation from Gaussian distribution, their lo cation may not b e estimated correctly by arithmetic mean. The numb er of data ep o chs (which corresp ond to exp erimental trials) used for averaging should b e as low as p ossible b ecause of various reasons: fatigue, learning and other factors affecting the brain resp onse of the sub ject; even the way the sub jects p erform the task may change if the task is rep eated 3

many times. In some applications such as BrainComputer Interface (BCI) only a few trials can usually b e used each time the averaging is computed (Wolpaw et al., 2002).

In some cases the improvement of ERP averaging can b e obtained with various techniques which comp ensate for the trialtotrial variability, mainly the latency jitter. Such approaches, however, require that the ERP could b e recognizable in single trials, which is often imp ossible, and involve the risk that the outcome can b e merely the result of lining up the background noise (Picton et al., 1995, 2000).

Improving the noise reduction by averaging can b e obtained with a technique called weighted averaging. In this metho d, each ep o ch is given a weight dep ending on estimated noise in this ep o ch (Hoke et al., 1984; LutkenhЁ Ё oner et al., 1985; Davila and Mobin, 1992; Lё eski, 2002). Though quite many variations of weighting averaging have b een develop ed (for review see, e.g., Lё eski (2002)), this metho d is still rarely used for averaging of ERP. As stated in Ё Ozdamar and Kalayci (1999), there are still unsolved issues related to computation of weights and their influence on the result of averaging. An imp ortant limitation of weighted averaging comes from the noise mo del which it assumes. According to this mo del, noise varies b etween ep o chs, but is stationary within each ep o ch (LutkenhЁ Ё oner et al., 1985). However, many typ es of noise (not only artifacts but also waves of ongoing EEG) can strongly vary within an ep o ch, having high amplitude in some time p oints and low amplitudes in other ones. Thus, the weights can b e underestimated for the part of ep o ch where a strong noise o ccurs and overestimated for another part of ep o ch where the noise is low. 4

Median averaging is another approach suggested for minimizing the influence of noise in ERPs (Borda and Frost, 1968; Yab e et al., 1993; Picton et al., 1995; Ё Ozdamar and Kalayci, 1999; Fox and Daleb out, 2002). It is similar to conventional averaging on the basis of arithmetic mean, with only difference that median is used instead of mean. Computation of median includes ordering of samples according to their amplitudes of all ep o chs for each time p oint relative to stimulus, indep endently from other time p oints. Due to this indep endence, median averaging cannot b e affected by the nonstationarity of noise within an ep o ch. It was shown that median averaging can improve averaging of endogenous ERPs using small numb er of trials (Yab e et al., 1993). A detailed study Ё by Ozdamar and Kalayci (1999) demonstrated the advantages of median averaging over conventional averaging for auditory brain stem resp onses which have low SNR due to very low amplitude and thus requires high numb er of ep o chs to b e averaged.

The disadvantage of median averaging is that it do es not only remove the outliers but also uses the rest of data only in the sense of the order of the values. It is evident that some useful information can b e lost by this pro cedure, comparing to conventional averaging, which employs the data values themselves instead of their order. In practice, median averaged waveforms include a rather strong high frequency noise (though it seems to b e easily removable Ё by filtering (Ozdamar and Kalayci, 1999)), and the results are not always improved relative to conventional mean averaging (Fox and Daleb out, 2002). One should also consider a p ossibility of unpredictable effects arising from the "overrobustness" of median. For example, the value of median will not change at all if we add a very large value to each of data values ab ove median (Streiner, 2000). 5

Is it p ossible to combine advantages of mean and median averaging? In fact, an estimator of data lo cation which lies b etween those two extremes already exists and its name is "trimmed mean". In this metho d (see the exact definition in the next section), a part of extreme values is discarded or mo dified, but all other values are used for averaging in the same way as in conventional mean averaging. It is imp ortant to note that rejection of extreme values is quite different from the pro cedure of artifact rejection. The latter pro cedure typically implies removing not only extreme values but all trials including extreme values or some other signs of artifacts and in this sense it is closer to weighted averaging, which defines the impact of an ep o ch by estimating it as a whole. Surprisingly, though trimmed mean and its mo dified version, Winsorized mean, are efficient robust lo cation estimators (Stuart, 1994), averaging of ERPs on the basis of these estimators has never b een rep orted, to the b est of our knowledge. The goal of this pap er is to demonstrate the efficiency of trimmed estimators of data lo cation for computing ERPs and to prop ose some ways to optimize their parameters, such as trimming parameters and parameters for weights related to order of averaged amplitudes. In this pap er, we: (1) briefly review a numb er of lo cator estimators, namely mean, median, and three trimmed estimators: trimmed mean, Winsorized mean and recently prop osed trimmed Lmean, (2) rep ort results of their testing using real auditory evoked p otential (AEP) data and their resampling, (3) prop ose a new adaptable lo cation estimator. 6

2

Statistical estimators of lo cation

2.1 Problem of robustness of estimation of the data location

The problem of sensitivity of an estimator to the presence of outliers, i.e. "the data p oints that deviate from the pattern set by the ma jority of the data set" (Hamp el et al., 1986), has lead to the development of robust lo cation measures. Robustness of an estimator is measured by the breakdown value, which tells us how many data p oints need to b e replaced by arbitrary values in order to make the estimator explo de (tend to infinity) or implo de (tend to zero). For instance: arithmetic mean has 0% breakdown whilst median is very robust with breakdown value of 50% (Hamp el et al., 1986).

2.2 Arithmetic mean

The most widely used statistical measure and the b est known estimator of lo cation is the arithmetic mean (see Figure 1(a)).
N

µ

mean

=
i=1

1 x N

i

(1)

The arithmetic mean is a standard lo cation estimator used for averaging of ERPs and for many other purp oses, however, it is not robust. In the case of arithmetic mean, only one outlier may make the estimate infinitely large or 1 small. The breakdown value b ecomes here lim = 0. N N 7

2.3 Robust location measures

Many lo cation estimators can b e presented in unified way by ordering the values of the sample as x
(1)

x

(2)

... x

(N )

and then applying the weight

function wi (Stuart, 1994)
N

µr =
i=1

wi x(i) ,

(2)

where wi is a function designed sp ecifically to reduce the influence of certain observations (data p oints) in form of weighting and x data. For the arithmetic mean it holds wi =
1 N (i)

represents the ordered

.

To make the comparison b etween different estimators easier, we present all weighting functions (as in (2)) plotted in Figure 1.

2.4 Median

Supp ose that the data have the size of (2M + 1), where M is a p ositive integer, then the median is the value of the (M + 1)
th

ordered observation. In the case

of even data size 2M the median is defined as the value of the mean of the samples M and M + 1. According to the framework of equation (2) the weight one is applied to the (M + 1)th sample in the case when the numb er of samples is o dd and weights equal to
1 2

to b oth M

th

and (M + 1)

th

samples when the

numb er of samples is even (Stuart, 1994) (see Figure 1(b)). 8

2.5 Trimmed mean

For the trimmed mean (where p = N ) the weights wi as in (2) can b e defined as: wi =

1 N -2p

, p+1iN -p otherwise

(3)

0,

Thus, the trimmed mean corresp ond to the mean value of data samples where p highest and p lowest samples are removed (see Figure 1(c)). Application of trimming lowers the influence of extreme data values on the result of averaging. However, unlike in median, substantial part of data can b e included into average.

2.6 Winsorized mean

In the case of trimmed mean, the tails of the distribution of the data are simply ignored. It can lead to the loss of information and should b e avoided when the sample size is small. Winsorized mean is similar to trimmed mean with the exception that it replaces each observation in each fraction (p = N ) of the tail of the distribution by the value of the nearest unaffected observation. Weight wi b ecomes here (see Figure 1(d))

0,
p+1 N 1 N

i p or i N - (p - 1) , i = p + 1 or i = N - p p + 2 i N - (p + 1) (4)

wi =

,

Usually, the values in the range 0 p 0.25N are considered, dep ending on the heaviness of the tails of the distribution. 9

An interesting observation is that the median can b e viewed as an extreme case of the trimmed mean or Winsorized mean when only one or two central data p oints are retained (Stuart, 1994).

2.7 Trimmed Lmean (TLmean)

Recently, Elamir and Seheult (2003) prop osed the trimmed Lmoments (TL moments) as a generalization of Lmoments (Hosking, 1990). The TLmean can b e estimated from a sample as a linear combination of order statistics. Using the formulation of equation (2) the weight function can b e calculated as follows (p = 0 for arithmetic mean)

wi =

(

i-1 p

)(

N -i p

)

( 0,

N 2p+1

)

, p+1iN -p (5) otherwise,

where

i p

=

i! p!(i-p)!

. Equation (5) reveals the connection b etween the TLmean

and the trimmed mean. In the calculation of the b oth statistics, the extreme observations are ignored. The main difference is that the trimmed mean applies the equal weight to the remaining observations whereas the TLmean uses higher weights for the observations near the median (see Figure 1(e)). The variance of various lo cation estimators was compared in Elamir and Seheult (2003) for samples originating from normal, logistic, doubleexp onential and normal distribution with outliers. The conclusions of this simulations are that TLmean is suitable for general use and p erforms reasonably well for normal and heavytailed distributions. Such information is not precise enough for the purp ose of averaging of ERPs and further investigations are necessary. 10

2.8 New empirical estimator: tanh mean

We prop ose a new estimator to alleviate the problem of enhancing noise by robust estimators such as trimmed mean or median. The weights are calculated using the hyp erb olic tangent functions (see Figure 1(f )) in such way that the signal samples are weighted by

wi =

tanh[k (i)] - s,
i 2

i<

N 2 N 2

(6)

- tanh[k ( - N )] + s, i

where k is the factor controlling the slop e of the weights for extreme values in data and s determines the vertical shift. The robustness is achieved by setting all weights with negative values to zero so the extreme observations will b e ignored, like for trimmed mean, Winsorized mean or TL-mean. Such formulation allows the control over the robustness (cancelling of extreme values) and over the influence of extreme values on the final estimate of the averaged trial. The shap e of the weighting function dep ends on parameters k and s. This estimator b elongs to the group of trimmed estimators by its ability to cancel of extreme values. However, its main feature is a flexible and optimized adjusting of the degree of influence of the values close to the extremes on the result of averaging. Both parameters k and s can b e optimized to achieve the b est p ossible value of signaltonoise ratio (SNR) (or of another computable p erformance index) of the averaged trials. In order to avoid that the estimator b ecomes biased it is necessary to estimate the parameters on a separate data set ("hold out" data set) different from that for which the SNR is computed. For the results rep orted b elow in this pap er the optimization was accomplished using 11

the NelderMead (Simplex) metho d of unconstrained nonlinear optimization (Nelder and Mead, 1965). The NelderMead metho d do es not require the ob jective function (such as SNR) to b e differentiable. Therefore it is well suited to problems involving a nondifferentiable ob jective function of a small numb er of decision variables. Optimization of the new estimator involves the function of only two variables. The NelderMead (Simplex) metho d is usually emb edded in standard optimizing packages. In order to address the small sample p erformance (which shows how the p erformance of the estimator degrades when decreasing the numb er of available data) (Gastwirth and Cohen, 1979) an exp eriment was carried out (see the section 4 for description and results).

3

Exp erimental setup

Three healthy male sub jects with normal hearing (age 2840) participated in the study. They seated in a chair in a dimly illuminated electrically shielded ro om with closed eyes and were asked to b e relaxed and listen to the sequence of sound stimuli, namely 800 Hz (frequent) and 1200 Hz (rare) tones, do not pay attention to frequent tones but press a button with right index finger as so on as p ossible when they hear the rare highpitch tone. Stimuli were presented using Neuroscan (http://www.neuro.com ) STIM system binaurally through earplugs (Tub ephone Insert Earphones ER-3 ABR, by Etymotic Research http://www.etymotic.com ) at 70 dB SPL. Each sound stimulus was a pure sine wave with duration of 35 ms (first and last 5 ms were linearly rising and falling). After a short practice session, exp erimental sessions 12

(3 sessions for sub jects 1 and 2, 2 sessions for sub ject 3) with breaks b etween them were run, with 689 stimuli (b oth rare and frequent) presented in each of them. Stimulus onset asynchrony (for b oth typ es of stimuli) was random numb ers uniformly distributed b etween 0.4 and 1.6 s. Rare tones were approximately 10% of the total numb er of stimuli. The numb er of frequent stimuli b etween rare stimuli was random, however, two constrains were set: stimulus onset asynchrony b etween a rare stimulus and the next frequent stimuli could not b e b elow 1.2 s; and at least 3 frequent stimuli were placed b etween each two random stimuli. The EEG was recorded with Neuroscan Data Acquisition System (software version 4.3.1, SynAmps 5083) from C3 and C4 electro des referenced to electrically linked earlob es. Amplifier gain was 1000, A/D bit size 12 (providing the resolution of 0.084 µV/bit), amplitude range 5.5 mV, sampling rate 500 Hz, analog filter bandpass 0.1-70 Hz (slop es 12dB/o ctave), notch filter was not used. Imp edance was b elow 5 k for sub jects 1 and 2. For the sub ject 3 it was p ossible to achieve the imp edance b etween 20 and 40 k and the EEG seemed to b e slightly distorted, but no line noise was visible (this EEG was used as an example of a recording of lower quality). Only the data recorded from C4 electro de was used. Before segmenting into ep o chs, the EEG was lowpass filtered (with zero phase shift, 96 dB/o ct. digital filter) b elow 30 Hz. For the analysis, ±500 ms ep o chs related to frequent stimuli (not requiring the resp onse) were extracted. The 16 frequent stimuli from the b eginning of each session up to 2
nd

such stimulus after the first

rare stimulus in the session and all frequent stimuli immediately following any rare stimulus were not used in the further analysis. For a few cases of missed or wrong resp onses (not related to rare stimulus), 2 frequent stimuli 13

immediately preceding and 2 immediately following the related stimuli were also excluded from the analysis. This resulted in 1787, 1796 and 1193 valid ep o chs for sub jects 1, 2 and 3, resp ectively (b elow: raw data). Some of these ep o chs contained artifacts, such as electromyogram (EMG) or artifacts caused by movements, but no clipp ed data. A "cleaned" subset of the data was formed by rejecting all ep o chs in which the amplitude in any time p oint (within the interval of ±500 ms relative to the stimulus) exceeded a threshold of 50 µV. Visual insp ection confirmed that the ma jority of these ep o chs contained some artifacts and that no visible artifacts (except, in rare cases, low amplitude EMG) were present in the remaining ep o chs. The numb er of ep o chs in "cleaned" subsets was 1752, 1776 and 1104 for sub jects 1, 2 and 3, resp ectively.

4

Results

Examples of averaged waveforms for all three sub jects are given in Figure 2. Median averaging pro duced more noisy averaged ERPs, which is more evident in the prestimulus time interval (where, ideally, the averaged signal should b e close to zero). The difference b etween the other estimators is, however, not clearly visible in these plots. We estimated the efficiency of different averaging metho ds and different trimming parameters by estimating SNR in randomly chosen subsets of ep o chs. It is commonly assumed (e.g., Davila and Mobin (1992)) that signal is contained in the p oststimulus part of the averaged data, while noise is included in b oth p oststimulus and prestimulus part. Prestimulus variance usually was 14

much lower than p oststimulus variance in our data, thus the impact of noise variance to p oststimulus interval was insignificant and we considered the p oststimulus part (approximately) as "signal". The following formula was used, therefore, for the calculation of signaltonoise ratio:

SNR = 10 · log

10

2 poststim. [dB] 2 prestim.

(7)

where

2 poststim.

is signal plus noise variance, defined as variance of the p ost-

stimulus interval (in our data, 2 400 ms relative to stimulus onset) of the averaged ERP, and
2 pr estim.

is "noise variance", defined as variance of the

prestimulus interval (in our data, 400 0 ms relative to stimulus onset) of the averaged ERP. The SNR was estimated according to (7) for each sub ject after averaging, using one of the lo cation estimators, a subset of 31 ep o chs randomly drawn from the whole "cleaned" set of ep o chs. The pro cedure was rep eated 100 times and SNR values were averaged (using the arithmetic mean). The results obtained for different estimators and different trimming parameters are presented in Figures 3(ac). For two sub jects (Figure 3(a) and 3(c)), mean averaging pro duced b etter SNR not only comparing to median, but also comparing to trimmed estimators almost for all values of trimming parameters. For one sub ject, b est SNR was obtained with trimmed estimators (Figure 3(b)). The fact that mean p erformed b etter than any other estimator in two cases can b e related to the high homogeneity of our data, which could b e the result of the absence of strong artifacts, stable attention provided by the exp erimental pro cedure and other factors. Results were similar for raw data sets 15

(not shown in the figure) which contained only small numb er of ep o chs with artifacts. To investigate effects of stronger inhomogeneity of the data, we simulated the alpha rhythm, an EEG comp onent which often b ecomes a serious problem for ERP averaging but was low in our recordings for all three subjects. White noise (random normally distributed numb ers) was generated and filtered by Butterworth filter of 2
nd

order with the passband of 911 Hz. Ran-

domly selected non-overlapping intervals of this simulated signal were added to randomly selected 20% of ep o chs of each sub ject's cleaned data set after multiplying by a constant computed for each ep o ch in such a way that the resulted maximum of absolute amplitude of added "alpha rhythm" in the ep o ch was 30 µV. This pro cedure resulted in the increase of data variance from 108 to 150 for sub ject 1, from 79 to 122 for sub ject 2 and from 172 to 212 for sub ject 3 (computed for all ep o chs together). The pro cedure of estimating the SNR describ ed ab ove was rep eated for these new "alpha" data sets. The results are presented in Figure 3(d-f ). Now, 2 of 3 sub jects had the b est SNR for the after a trimmed averaging pro cedure. The worst results for all three sub jects and b oth without and with adding simulated alpha rhythm were found for median. In the case of sub ject 3 after addition of the simulated alpha rhythm (Figure 3(d)) SNR was similar for mean and median, but in this case trimmed estimators provided esp ecially prominent improvement of SNR. It is p ossible that averaging using trimmed estimators can provide b est results if their parameters (trimming parameter for trimmed mean, Winsorized mean and TL-mean, and parameters k and s for the tanh mean) are optimized for a given typ e of data and/or given sample size. This is illustrated by the example presented in Figure 4. One hundred randomly chosen ep o chs from sub ject 1 16

(raw data set) were averaged using all lo cation estimators discussed in this pap er after optimizing the parameters of trimming estimators using a separate set of the same numb er of randomly chosen 100 ep o chs, and the SNR was estimated according to (7). Optimization was made using NelderMead simplex metho d of optimization (Bo ck, 1998) for the tanh estimator and by cho osing the parameter p which maximized the SNR for trimmed, Winsorized and TLmean. The pro cedure was rep eated after successive removal of an increasing numb er of randomly selected ep o chs. Averaged results for 100 rep etition of the ab ove pro cedure are plotted in Figure 4. It was p ossible for every numb er of removed ep o chs to obtain b etter SNR for trimmed estimators comparing to median and mean (which cannot b e optimized). The b est p erformance, esp ecially for smaller numb er of remaining ep o chs, was obtained for the tanh mean. The relation b etween the slop e parameter k of the tanh mean and the value of the criterion to b e maximized was also investigated. The results (not presented in this pap er) show that the global maximum is easily obtained for the new estimator with small risk of stuck into lo cal minima, due to applied metho d of optimization (Bo ck, 1998).

5

Discussion

Robust trimmed estimators of data lo cation gradually gain p opularity in various statistical applications but have not b een adopted for the ERP research yet. The examples presented ab ove demonstrate that trimmed estimators may improve the results of averaging, the pro cedure which is crucial for ERP analysis. The evidence they provide for such improvement is preliminary due to limited numb er of tests and sub jects in the study. However, we can claim that 17

if improvement of averaging of ERP is esp ecially imp ortant, trimmed estimators should b e considered as a p ossible alternative to conventional averaging.

As alternatives to conventional averaging, weighted averaging and median averaging were considered in the ERP literature so far. In the intro duction, we already argued that weighted averaging assumes quite unrealistic noise mo del. Median averaging seems to b e to o robust estimator which may discard to o large part of information presented in the data. Trimmed estimators are more robust than conventional mean but not as "overrobust" as median. Of course, when artifacts are few or can b e easily removed and data are very close to normal, there is no need to use other estimators than mean. But in many cases when strong deviations from Gaussianity o ccurs (e.g., when a small numb er of trials is averaged), averaging based on arithmetic mean can b e not sufficient and trimmed estimators b ecome a reasonable choice. Additional opp ortunities to improve averaging are given by weighting of the amplitude values from different ep o chs according to their rank, which is provided by TL-mean and by tanh mean prop osed in this pap er. Note that this typ e of weighting inside the averaging pro cedure is different from such pro cedure in usual weighted averaging, which utilizes ep o ch (trial) characteristics rather than the amplitudes only at the given time p oint; this is imp ortant for efficient pro cessing of highly nonstationary data. Trimming can b e also understo o d in terms of weighted averaging, as giving zero weight to extreme values (Figure 1 gives an idea of how this viewp oint can b e applied to all estimators studied in this pap er, including median and mean.). Because the trimming itself is evidently a rather rough pro cedure, more advanced weighting, as in TL-mean and tanh mean, can probably provide additional improvements of the results of averaging. As our current results show, averaging using trimmed estimators may 18

provide much smo other ERP waveforms than median averaging and at least in some cases provide b etter SNR of ERP than b oth median and arithmetic mean. However, no clear difference b etween trimmed mean, Winsorized and TL-mean was found in the cases of the b est choice of trimmed parameters for each of them. Optimization of the choice of the sp ecific lo cation estimator and its trimming parameters or any other parameters can b e done for each sp ecific data set. This pro cedure requires, of course, additional efforts and exp ertise. However, if an increase of SNR is strongly desirable, it can b e worthwhile at least to compare the results of averaging with, for example, usual arithmetic mean, trimmed mean (e.g., with trimming of 25% of data samples from each tail of the distribution) and TL-mean (with a small trimming parameter). Note that the trimming parameter in TL-mean and k and s for the tanh mean influence not only trimming but also the weights of non-trimmed samples (see (5), (4) and Figure 1). This fact explains why TL-mean's dep endence on trimming parameter is quite different from such dep endence for trimmed and Winsorized mean (Figure 3). More precise optimization, which is esp ecially imp ortant in the case of tanh mean, can b e done with NelderMead (Simplex) metho d of unconstrained nonlinear optimization (Nelder and Mead, 1965). The parameter to b e optimized should b e chosen carefully according to the ob jectives of the sp ecific study (note that different approaches to estimation of SNR in ERP exist; see, e.g., Hoke et al. (1984); Copp ola et al. (1978); Davila Ё and Mobin (1992); Ozdamar and Delgado (1996)). For unbiased optimization, a separate set of data with characteristics similar to the analyzed data should b e used. We considered in this pap er only symmetrically trimmed estimators, b ecause 19

they are appropriate for amplitude distributions without high asymmetry, and amplitude distributions of nonaveraged ERP data typically are not very asymmetrical. Asymmetric trimmed mean estimators allowing different prop ortion of trimming at lower and higher tails of the distribution (e.g., (Lee, 2003)) probably can b e also applied for averaging of ERPs, esp ecially for small samples where the asymmetry of the distribution may b e high. Trimmed estimators are a class of robust estimators of data lo cations which can help to improve averaging of ERPs when numb er of trials is small, the data are highly nonstationary and include outliers. Their advantages can b e understo o d as a reasonable compromise b etween median which is very robust but discard to o much information and arithmetic mean conventionally used for averaging which use all data but, due of this, is sensitive to outliers. Additional improvement of averaging can b e gained by intro ducing weighting of ordered data, as in newly intro duced TL-mean and tanh mean prop osed in this pap er.

6

Acknowledgement

The authors would like to thank Dr. Pando Gr. Georgiev for his valuable suggestions ab out the tanh averaging.

References Bo ck RK. Simplex for Metho d: Nuclear NelderMead. Research, CERN Europ ean 1998:online,

Organization

Geneva,

http://abbaneo.home.cern.ch/rkb/AN16pp/no de262.html Borda RP, Frost JD Jr. Error reduction in small sample averaging through 20

the use of the median rather than the mean. Electro encephalogr Clin Neurophysiol. 1968;25(4):3912. Copp ola R, Tab or R, Buchsbaum MS. Signal to noise ratio and resp onse variability measurements in single trial evoked p otentials. Electro encephalogr Clin Neurophysiol. 1978;44(2):21422. Davila CE, Mobin MS. Weighted averaging of evoked p otentials. IEEE Trans Biomed Eng. 1992;39(4):33845. Elamir EAH, Seheult AH. Trimmed L-moments. Comput Stat Data An 2003;43:299314. Fox LG, Daleb out SD. Use of the median metho d to enhance detection of the mismatch negativity in the resp onses of individual listeners. J Am Acad Audiol 2002;13(2):8392. Gastwirth JL, Cohen ML. Small Sample Behavior of Some Robust Linear Estimators of Lo cation. J Am Stat Asso c 1970;65(30):94673. Hamp el FR, Ronchetti EM, Rousseeuw PJ, Stahel W. Robust Statistics: The Approach based on Influence Functions. Wiley: Toronto, 1986. Hoke M, Ross B, Wickesb erg R, LutkenhЁ Ё oner B. Weighted averaging theory and application to electric resp onse audiometry. Electro encephalogr Clin Neurophysiol. 1984;57(5):4849. Hosking JRM. Lmoments: Analysis and Estimation of Distributions using Linear Combinations of Order Statistics. J Royal Stat So c B 1990;52(1):105 24. Jansen BH, Agarwal G, Hegde A, Boutros NN. Phase synchronization of the ongoing EEG and auditory EP generation. Clin Neurophysiol. 2003;114(1):7985. Lee JY. Low bias mean estimator for symmetric and asymmetric unimo dal distributions. Comput Metho ds Programs Biomed 2003;72(2):99107. 21

Lё eski JM. Robust weighted averaging. IEEE Trans Biomed Eng 2002 49(8):796804. Lop es da Silva F. Eventrelated p otentials: Metho dology and quantification. In Niedermeyer E, Lop es da Silva F, editors. Electro encephalography: Basic Principles, Clinical Applications, and Related Fields. Williams and Wilkins: Baltimore, 1999:94757. LutkenhЁ Ё oner B, Hoke M, Pantev C. Possibilities and limitations of weighted averaging. Biol Cyb ern. 1985;52(6):40916. Makeig S, Westerfield M, Jung TP, Enghoff S, Townsend J, Courchesne E, Sejnowski TJ. Dynamic brain sources of visual evoked resp onses. Science 2002;295(5555):6904. Nelder JA, Mead R. A Simplex Metho d for Function Minimization. Comput J 1965;7:30813. Ё Ё Ozdamar O, Delgado RE. Measurement of signal and noise characteristics in ongoing auditory brainstem resp onse averaging. Ann Biomed Eng. 1996;24(6):70215. Ё Ё Ozdamar O, Kalayci T. Median averaging of auditory brain stem resp onses. Ear Hearing 1999;20:25364. Picton TW, Lins OG, Scherg M. The recording and analysis of eventrelated p otentials. In F. Boller F, Grafman J, editors. Handb o ok of Neuropsychology, Elsevier: Amsterdam, 1995;10(14):373. Picton TW, Bentin S, Berg P, Donchin E, Hillyard SA, Johnson R Jr, Miller GA, Ritter W, Ruchkin DS, Rugg MD, Taylor MJ. Guidelines for using human event-related p otentials to study cognition: recording standards and publication criteria. Psychophysiology 2000;37(2):12752. Rice JR. Mathematical Statistics and Data Analysis. Duxbury Press: Belmont, 1995. 22

Streiner DL. Do you see what I mean? Indices of central tendency. Can J Psychiatry 2000 45(9):8336. Stuart A, Keith Ord J. Kendall's Advanced Theory of Statistics. Edward Arnold: London, 1994. Tukey JW: A survey of sampling from contaminated distributions. In Olkin I, editor. Contributions to Probability and Statistics. Stanford University Press:Stanford, 1960:448485. Wolpaw JR, Birbaumer N, McFarland DJ, Pfurtscheller G, Vaughan TM. Brain-computer interfaces for communication and control. Clin Neurophysiol 2002;113:76791. Yab e H, Saito F, Fukushima Y. Median metho d for detecting endogenous event-related brain p otentials. Electro encephalogr Clin Neurophysiol. 1993;87(6):4037.

23

0.16 0.14 0.12

0.5

0.4
0.1 weight

weight

0.08 0.06 0.04 0.02 0

0.3

0.2

0.1

2

4

6

8

10 12 Ordered data

14

16

18

20

0

2

4

6

8

10 12 Ordered data

14

16

18

20

(a) Mean
0.16 0.14 0.12 0.1 weight 0.08 0.06 0.04 0.02 0 weight 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0

(b) Median

2

4

6

8

10 12 Ordered data

14

16

18

20

2

4

6

8

10 12 Ordered data

14

16

18

20

(c) Trimmed mean
0.16 0.14 0.12 0.1 weight 0.08 0.06 0.04 0.02 0 weight 2 4 6 8 10 12 Ordered data 14 16 18 20

(d) Winsorized mean
0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0

2

4

6

8

10 12 Ordered data

14

16

18

20

(e) TL-mean

(f ) Tanh mean

Fig. 1. Weights for the different lo cation estimators. Data (N = 20) are sorted in ascending order. The arithmetic mean uses the same weight 1/N for all observations. The median uses only the middle observation (o dd numb er of observations) or the two middle observations (even numb er of observations). Note that the scale is different in figure (b). The trimmed mean applies a zero weight for extreme observations and an equal weight for all other observations. The Winsorized mean concentrates the weights of the ignored extreme observations to the last accepted data p oints. The TL-mean applies higher weights for the middle observations, while the new tanh mean applies smo othly changing weights to the values close to extreme. The trimming parameters for the trimmed mean and the Winsorized mean are set in this example to ignore two minimum and two maximum values. Trimming parameter in TL-mean and parameters of tanh mean are set to ignore one minimum and one maximum values.

24

3 2 1 0 amplitude [µ V] -1 -2 -3 -4 -5 -6 -7 -1 tanh mean trimmed mean mean median -0.5 0 time [s] 0.5 1 amplitude [µ V]

4 3 2 1 0 -1 -2 -3 -1

-0.5

0 time [s]

0.5

1

(a) sub ject 1
8 6 4 amplitude [µ V] 2 0 -2 -4 -6 -1

(b) sub ject 2

-0.5

0 time [s]

0.5

1

(c) sub ject 3 Fig. 2. Comparison of the averaged ERPs (100 trials, raw data) using the mean averaging (black line), median averaging (green line), trimmed mean averaging (p = 25, blue line) and tanh averaging (k = 0.1, s = 0, red line). The stimulus is presented at zero time p oint. Median averaging enhanced strongly the noise. Some high frequency low amplitude noise app eared also in trimmed mean average. Conventional mean and tanh mean provided most smo oth waveforms while tanh mean enhanced N1 p eak comparing to mean averaging.

25

(a) sub j. 1, alpha = 0%

(b) sub j. 1, alpha = 20%

(c) sub j. 2, alpha = 0%

(d) sub j. 2, alpha = 20%

(e) sub j. 3, alpha = 0%

(f ) sub j. 3, alpha = 20%

Fig. 3. Comparison of SNR of the averaged ERPs using different averaging metho ds (mean (dots), median (solid line), trimmed mean (circles), Winsorized mean (X's) and TLmean (crosses)) and different values of trimming parameter p. SNR was computed 100 times for averaged indep endent subsets of 31 ep o chs randomly drawn from "cleaned" data set. Horizontal lines shows the values obtained for mean and median (corresp ond to lowest and highest p ossible p). alpha parameter shows the p ercentage of ep o chs with artificially added "alpha rhythm" (see text for details).

26

30

25

tanh estimator Trimmed mean TL mean Winsorized mean Arithmetic mean Median

SNR [dB]

20

15

10

5 5

10

15 20 25 30 35 40 Percentage of randomly removed trials

45

50

Fig. 4. Small sample p erformance of the estimators. SNR was estimated for averaged 100 ep o chs which were randomly chosen from sub ject 1 raw data set and for averaged ep o chs remained after randomly removing increasing numb er of ep o chs. Parameters of trimmed estimators were optimized using indep endent sets of the same numb er of randomly chosen ep o chs (see details in the text). SNR was averaged for 100 rep etitions of the pro cedure. Note that in the case of tanh mean 2 parameters were optimized instead of 1 parameter in three other trimmed estimators, while mean and median have no parameters which could b e optimized; this could at least partly contribute to SNR variations amongst the estimators.

27