Документ взят из кэша поисковой машины. Адрес оригинального документа : http://hea-www.harvard.edu/AstroStat/etc/stenning+2013_mathematical_morphology_sam_11200_329-345.pdf
Дата изменения: Mon Jul 14 20:06:37 2014
Дата индексирования: Sun Apr 10 13:45:40 2016
Кодировка:
Поисковые слова: magnetic north

Morphological Feature Extraction for Statistical Learning With Applications To Solar Image Data
David C. Stenning1 , Thomas C. M. Lee2 , David A. van Dyk3 , Vinay Kashyap4 , Julia Sandell5 and C. Alex Young6
1 2 3 4

Department of Statistics, University of California, Irvine, CA 92617, USA Department of Statistics, University of California, Davis, CA 95616, USA

Statistics Section, Department of Mathematics, Imperial College London, SW7 2AZ, UK

High-Energy Astrophysics Division, Smithsonian Astrophysical Observatory, Cambridge, MA 02138
5

Department of Physics, University of Pennsylvania, Philadelphia, PA 19104, USA
6

Heliophysics Science Division, NASA/GSFC, Greenbelt, MD 20771, USA

Received 3 May 2012; revised 21 March 2013; accepted 10 May 2013 DOI:10.1002/sam.11200 Published online in Wiley Online Library (wileyonlinelibrary.com).

Abstract: Many areas of science are generating large volumes of digital image data. In order to take full advantage of the high-resolution and high-cadence images modern technology is producing, methods to automatically process and analyze large batches of such images are needed. This involves reducing complex images to simple representations such as binary sketches or numerical summaries that capture embedded scientific information. Using techniques derived from mathematical morphology, we demonstrate how to reduce solar images into simple `sketch' representations and numerical summaries that can be used for statistical learning. We demonstrate our general techniques on two specific examples: classifying sunspot groups and recognizing coronal loop structures. Our methodology reproduces manual classifications at an overall rate of 90% on a set of 119 magnetogram and white light images of sunspot groups. We also show that our methodology is competitive with other automated algorithms at producing coronal loop tracings and demonstrate robustness through noise simulations. 2013 Wiley
Periodicals, Inc. Statistical Analysis and Data Mining 6: 329 345, 2013

Keywords:

mathematical morphology; image analysis; classification; sunspots; coronal loops; skeletonization

1.

INTRODUCTION

The ability to extract meaningful information from large amounts of image data has far-reaching applications in such diverse fields as medicine, computer vision, and astronomy [1]. Advancements in imaging technology are yielding massive data sets that are increasingly laborious to process manually. Studying complex images `by eye' also limits the types of analyses that can be performed

Correspondence to: David A. van Dyk (dvandyk@imperial. ac.uk)

since interesting features must be extracted and propagated in machine-readable form before they can be utilized in sophisticated statistical procedures. The need for automated methods is particularly apparent in the field of solar physics, with observatories such as the Solar and Heliospheric Observatory (SoHO), the Transition Region and Coronal Explorer (TRACE), and the Solar Dynamics Observatory (SDO) carrying high-resolution instruments operating at various wavelengths. Experts typically detect and analyze features in SoHO and TRACE images manually, but newer observatories such as SDO -- with its continuous science data downlink rate of 130 Megabits per second -- render impractical common labor-intensive

2013 Wiley Periodicals, Inc.

330

Statistical Analysis and Data Mining, Vol. 6 (2013)

techniques. With routines arising from mathematical morphology (Section 2), we develop general techniques for extracting scientifically meaningful numerical quantities from complex high-throughput images that can be used as covariates in statistical learning methods for classification and ultimately tracking and prediction of solar features. Our overriding goal is to extract scientifically meaningful and interpretable numerical features from solar images.The numerical features can be carried forward into secondary analyses that will also be interpretable in terms of meaningful scientific quantities. Our goal of extracting numerical features from images for use in secondary statistical analysis is similar in spirit to the use of functional data as predictor variables in regression. This is typically accomplished using a set of independent basis functions that represent the functional data. Although this is a mathematically attractive strategy, it does not generally lead to scientifically meaningful summaries. One notable exception involves the use of a dependent library of generating functions to represent the functional predictors [2]. This allows quantities such as the frequency, locations, and size of dips, bumps, and plateaus to be captured and passed on to the secondary analysis. Like this, we also aim to preserve scientifically meaningful summaries, but of very different predictors: images of complex solar features. Mathematical morphology (MM) is a valuable tool for extracting shape characteristics from image data, and is well suited to the task of analyzing complex solar features. It is a nonlinear process, but we show below that it is highly effective in extracting useful numerical summaries from image data. Using appropriate morphological operations, images can be simplified by preserving the essential shape of geometric structures and eliminating noise. Therefore, MM is an excellent imaging tool for filtering, segmentation, and taking measurements such as feature areas from an image. Our general approach to solving practical solar imaging problems is to break the original problem into a sequence of subproblems until these subproblems can be solved in a relatively simple manner. For example, one may decompose an image classification problem into the following subproblems: (i) clean the image, (ii) perform segmentation to delineate the features of interest, (iii) extract various measurements from the image, and (iv) feed these measurements to a classifier. In this example, MM can naturally be applied to solve subproblems (i) to (iii). In the remaining of this section we describe two solar imaging problems for which MM can be employed to solve some subproblems. A major concern of current solar physics, and a stated mission for current solar observatories (http://sdo.gsfc.nasa. gov/mission/about.php), is to improve understanding of the Sun's influence on Earth and Near-Earth space. Activity in the solar corona -- the Sun's `atmosphere' -- resulting Statistical Analysis and Data Mining DOI:10.1002/sam

in extreme space-weather events can have a damaging impact on Earth. In particular, highly energetic events such as solar flares -- sudden bursts of radiation following the release of magnetic energy -- and coronal mass ejections (CMEs) -- massive bursts of coronal material -- eject charged particles into space, which have the potential to damage technological infrastructure (http://www.nap.edu/ catalog.php?record_id=12507). For instance, a geomagnetic storm in 1989 was responsible for the collapse of the HydroQuebec power grid and left millions of people without ґ power for nine hours [3]. In addition, charged particles pose a danger to astronauts on the International Space Station, or even passengers flying in aircraft at high altitude (through both exposure to radiation and the potential damage to aircraft computer systems). Solar flares and CMEs are known to be related to various observed solar features, in particular sunspots and their corresponding magnetic active regions. Sunspots are dark areas on the Sun's photosphere -- the region that emits the light that we see -- that form when convection is inhibited by intense magnetic fields. Sunspots are classified based on the complexity of associated magnetic flux distribution as viewed in magnetograms, images of the spatially resolved line-of-sight magnetic field in the photosphere. One sunspot classification scheme in particular, the Mount Wilson scheme, has some power to predict solar flares and CMEs when combined with other space-weather data [4]. However, this classification is carried out manually and as a result is both laborious and prone to inconsistencies stemming from human observer bias [4]. That is, manual classification results in nonreproducible catalogs as two experts looking at the same set of images will not always agree. Automated sunspot classification procedures based on statistical learning methods will result in reproducible catalogs, but require numerical covariates as inputs. Using our general numerical feature extraction techniques, we produce summaries of sunspots/active regions from SoHO images that are relevant to the sunspot's classification. The scientific relevance of these numerical summaries is demonstrated by their successful use as input covariates to a supervised learning algorithm that can reproduce manual classifications with an acceptable level of agreement. As we will discuss in further detail in Section 3, it is not necessary or desirable to have the automatic classifier exactly mimic the manual class assignments. Insofar as the Mt. Wilson classification scheme contains relevant information regarding activity around a sunspot [4], by constructing numerical summaries guided by the Mt. Wilson classification rules we aim to capture the same useful scientific information. The key is that the information is obtained in a self-consistent manner, leading to more objective and reproducible data analyses. The scientific information will also be encoded in numerical feature

Stenning et al.: Morphological Feature Extraction for Statistical Learning

331

vectors instead of images, opening increased opportunities for downstream analyses. Ultimately, solar physicists are interested in how features observed on the photosphere are related to volatile events originating with the release of magnetic energy in the corona. Coronal loops -- plasma-filled structures that trace out the Sun's magnetic field -- are rooted in the photosphere (the roots are referred to as footpoints) and are thus related to the morphological configurations of sunspot groups. In the vast majority of cases coronal loops are identified manually pixel-by-pixel, which is laborious and inconsistent. Hence, complex TRACE and SDO extreme ultraviolet wavelength (EUV) images provide another useful benchmark for testing our general feature extraction techniques, where the objective is to produce simple but scientifically meaningful representations of coronal loop structures that can then be used in subsequent automated procedures. Our goal is to carry out loop tracings selfconsistently, based solely on the images, without invoking external factors such as magnetic field configurations. The value in these tracings is in how they are utilized in subsequent analyses by solar physicists. The increase in quantity and quality of solar image data has spurred interest in developing automated techniques for processing such data. A general review of existing image processing techniques -- including MM -- useful for automated feature recognition with solar data is given in ref. 5. Simple MM is used by Curto et al. [6] in their procedures for automatically detecting and grouping sunspots. While this method is broadly similar to the initial step of our approach, it focuses on identifying sunspot groups whereas we are interested both in classifying sunspot groups and in obtaining numerical summaries of sunspot groups and active regions that can be used for statistical learning. Identification of the sunspot groups is a necessary precursor to both of these tasks. Colak and Qahwaji [7] present a system for automatically detecting and classifying sunspot groups according to the McIntosh classification scheme [8]. While we develop our methodology to match the Mt. Wilson scheme, which is more useful as a measure of the complexity of the magnetic field structure, our results and reclassifications will be applicable in either case. Although several groups have worked on automated methods for tracing coronal loops, a satisfactory method for this challenging task remains elusive. Aschwanden et al. [9], for example, compare five algorithms for tracing coronal loops coming from four independent research groups and demonstrate that none of these methods can adequately reproduce results obtained from manual/visual tracing. We illustrate our method on the same test TRACE image and show that our method is competitive. Aschwanden et al. emphasize that comparison to manual/visual techniques is not necessarily a useful

benchmark for evaluating automated routines, but the lack of robustness when comparing the various methods is disappointing. In particular, current methods for sewing together detected loop fragments and for quantifying uncertainty in traced loops are either unsatisfactory or nonexistent. This article is divided into five sections. We begin in Section 2 with a brief introduction to MM and standard image analysis tools, and describe our general approach for extracting scientifically meaningful numerical features from images that can be used for statistical learning. In Section 3 we show how our general techniques can be used to extract numerical features from complex SoHO magnetogram and white light images that can be used in an automatic sunspot classification algorithm. In Section 4 we present an example of how our techniques can be applied to TRACE and SDO EUV images to automatically recognize and analyze coronal loops. Finally, in Section 5 we discuss our results and directions for future work. Throughout this article we use the word feature to describe interesting aspects of an image, such as sunspots, active regions, or coronal loops. This is not to be confused with the numerical summaries that are typically referred to as features in the machine learning literature. We refer to the latter as numerical features to avoid potential confusion. The data sets and code used to perform our analysis can be found at http:// cfa.lib.harvard.edu/dvn/dv/dstenning.

2.

SCIENCE-DRIVEN IMAGE ANALYSIS

The goal of science-driven image analysis is to derive scientifically meaningful quantities and machine-readable representations of images that can be used for statistical learning. MM, when combined with standard image analysis techniques, is a powerful tool for capturing the essential scientific information in a simple `sketch' representation, a segmented image that resembles the drawing an expert would make in copying the raw image by hand. For example, a simplified representation of a coronal loop image is a binary image with pixels corresponding to the loop structure assigned a value of one. Magnetograms can be likewise segmented into simplified `trinary' images with regions of negative magnetic polarity, positive magnetic polarity and background assigned values of two, one, and zero, respectively. The binary/trinary images sketch the solar features of interest so that numerical summaries capturing important scientific information can be calculated. 2.1. Feature Recognition

The first step in science-driven image analysis is to recognize scientifically meaningful features. For example, Statistical Analysis and Data Mining DOI:10.1002/sam

332

Statistical Analysis and Data Mining, Vol. 6 (2013)

we need to be able to detect sunspots, active regions, and coronal loops in solar image data as those features provide rich information about solar processes. Here we describe two typical methods used for general feature recognition and comment on their feasibility for science-driven image analysis. Thresholding: By looking at an intensity histogram of an image, we can often determine whether the interesting features are best identified by thresholding the histogram at some particular value. Typical strategies to determine the threshold value include using the standard deviation in the histogram, using a global or a local median filter, etc. However, this method is not universally applicable because the features of interest may not be the brightest, or may exhibit variation in intensity. There is also no justification to choose one type of thresholding over another. Thus, care must be taken to ensure that the adopted threshold is not destructive to the feature we wish to study. Background Subtraction: Background subtraction enhances the contrast of an image by making the interesting features more prominent. Typically the background is determined locally, by measuring the intensity in pixels surrounding a feature. However, for solar features, such local determination is generally not a reliable estimator of the true background. This is because (i) background pixels will be contaminated by spillover emission from the source feature and (ii) there may be overlapping features over the alleged background pixels. We therefore do not use background subtraction to detect sunspots, active regions, or coronal loops, but nevertheless carry out this operation on the TRACE and SDO images with a view toward improving the visibility of the loops. We determine the background as an average over the border of a 10 в 10 pixel cell, and subtract it from the average over the inner cell (a 2 в 2 pixel cell for TRACE and a 3 в 3 pixel cell for SDO) to determine the background-subtracted source intensity. We also test the sensitivity of our procedure to variation in cell size (see Section 4.3).

2.2.

Mathematical Morphology

MM is a powerful tool for extracting and processing scientific information from image data because morphological operations relate directly to the shape of observed features. Here we introduce some morphological operations that are useful in extracting scientifically meaningful numerical features from images. A more detailed introduction to morphological analysis is given in the Appendix. More in depth coverage can be found in refs 10 and 11. Dilation and Erosion: Dilation and erosion are the two fundamental operations in MM. They form a duality and they both use a structuring element (SE) Y to probe and alter the shapes of geometric structures inside an image I . The dilation of I by Y is the set of points z such that Y hits I when the origin of Y is placed at z. Therefore the dilation of I always enlarges I . The erosion of I by Y is defined as the set of points z such that Y fits wholly inside I when the origin of Y is at z. In contrary to dilation, erosion always shrinks I . For real-valued/grayscale images, the SE smoothes the three-dimensional image surface, with the height of the image surface at each pixel being equal to its intensity value. Morphological Opening: A morphological opening operation involves, first, an erosion of the image with a SE, followed by a dilation with the same SE. Since after an erosion, only those features in the image that are morphologically similar to the SE are still present, this effectively enhances such features in the image and smoothes them from the interior. Opening also has a filtering effect: image structures that cannot completely contain the SE are removed from the image. A simple example of a morphological opening operation on a binary image is given in Fig. 1(a). Morphological Closing: The opposite operation to opening is morphological closing, which smoothes features from their exterior. A closing operation is a dilation, followed by an erosion, which essentially smoothes out the image and fills in gaps without degrading or distorting the salient features, as would occur with normal boxcar or Gaussian smoothing. A simple example of a morphological

(a)
B

(b)
B

X

OB (X) X CB (X)

Fig. 1 Illustration of morphological opening and closing on binary images. (a) Opening of a set X by a disk B . (b) Closing of a set X by a disk B .

Statistical Analysis and Data Mining DOI:10.1002/sam

Stenning et al.: Morphological Feature Extraction for Statistical Learning
(a) (b) (c) (d)

333

Fig. 2 Examples of the four classes of sunspot groups used in the Mt. Wilson scheme, with magnetograms in the top row and white light images in the bottom row. The class (a) is dominated by a single unipolar sunspot that appears white or black in the magnetogram, depending on the polarity (positive or negative). The class (b) has spots of both positive and negative polarity that can be separated by a single north south polarity inversion line. The class (c) exhibits a complex distribution of polarities, and a single north south polarity inversion line cannot cleanly divide the positive and negative regions of magnetic flux. In the class (d), examination of the white light image in conjunction with the magnetogram reveals umbrae of different polarity within a single enclosed penumbra.

closing operation on a binary image is given in Fig. 1(b). In practice, choosing between morphological opening and closing depends on the features to be enhanced or type of noise to be removed. Morphological Skeletonization: Skeletonization extracts the interior `skeletons' in extended regions; the locus of the points that form the skeleton traces out the spine of the region, yielding a sketch representation of the original features. They are the innermost possible pixels in the region, and are ideally suited to capture, for example, a simplified representation of coronal loops that can then be used to extract location/shape information. Morphological Pruning: Morphological pruning removes the small offshoots that may exist in a morphological skeleton owing to irregularities in the boundaries of the region. Such offshoots can be eliminated by first identifying the locations where the offshoots exist, then finding the lengths of such regions, and then eliminating all structures that are a few pixels long or smaller, to produce a cleaned skeleton that better represents the feature of interest.

3. 3.1.

SUNSPOT CLASSIFICATION Mount Wilson Classification

The Mt. Wilson classification scheme groups sunspots into four broad classes based on the morphology of

magnetically active regions as viewed in magnetogram images. Examples of the four classes appear in Fig. 2. The simplest class morphologically is the class, defined as a single unipolar sunspot -- a single spot of either positive or negative polarity, which is often linked to a plage of opposite polarity. Plage is a diffuse network of magnetic fluxtube footpoints formed when magnetic field lines shooting outward from the photosphere scatter down over a wide area. For bipolar sunspot groups, spots of opposite magnetic polarity are visible in magnetogram images and multiple sunspots tend to be present in the white light images, forming a sunspot group. The simplest bipolar class morphologically is the class, which is a pair of sunspots of opposite magnetic polarity with a single north south polarity inversion line -- a simple and distinct linear spatial division oriented in the solar north south direction -- between the polarities. If a bipolar group is sufficiently complex that a single north south polarity inversion line cannot divide the two polarities, then it is a sunspot group. If a group also contains umbrae of different polarity inside a single penumbra, which is known as a delta spot, then it is a sunspot group. The umbra is the dark, inner part of the sunspot, and is surrounded by the slightly lighter penumbra as can be clearly seen in the white light image (bottom row) of Fig. 2(d). Classification of sunspots is commonly performed through visual inspection by experts, and publicly available sunspot lists are manually determined. The Mt. Wilson Statistical Analysis and Data Mining DOI:10.1002/sam

334

Statistical Analysis and Data Mining, Vol. 6 (2013)

(a)

(b)

(c)

(d)

(f)

(g)

(e)

(j)

(h)

(i)

Fig. 3 Identifying active region pixels. (a) The raw white light image. (b) The inverted white light image after applying a morphological opening operation using a spherical SE with radius 5. (c) The pixels belonging to the sunspot group identified by thresholding. (d) The sunspot area found by twice dilating the previous image using a disk-shaped SE with radius 1. (e) The raw magnetogram. (f) The magnetogram after applying a morphological opening operation using a spherical SE with radius 1. (g) The positive polarity active region pixels identified by thresholding. (h) The inverted magnetogram after applying a morphological opening operation using a spherical SE with radius 1. (i) The negative polarity active region pixels identified by thresholding. (j) The simple active region representation found by combining the positive and negative polarity active region pixels and excluding any pixels that are not also identified as part of the sunspot area in image (d).

scheme is popular because it is based on a simple and interpretable set of rules (as described above) and has some power to predict flares when combined with other solar data [4]. However, while the classification rules are simple, the morphology of active regions is better described by a continuum rather than a discrete clustering. For example, the morphology of a particular active region may exist somewhere between a group and a group and experts may disagree as to the `correct' classification. As a result, manual classification in general suffers from human observer bias stemming from the subjective and often ambiguous morphologies of active regions [4]. A catalog of sunspot identifications and classifications constructed manually is nonreproducible, which partly motivates our automated procedure.

3.2.

Generating Numerical Summaries of Solar Active Regions

In this section we describe how MM can be used to extract scientifically meaningful and statistically useful numerical features. In particular, we detail our step-bystep procedure for generating numerical summaries of active region morphology using SoHO magnetogram and Statistical Analysis and Data Mining DOI:10.1002/sam

white light images, improving and extending upon our work described in ref. 12. In particular, we use the white light images to obtain the general location of active regions in magnetograms to better differentiate between active region and plage network. We also calculate additional numerical summaries that characterize active region complexity that are of scientific interest in addition to serving as input covariates to statistical learning algorithms aimed at sunspot classification. Our general strategy is to obtain simple sketches (in the form of trinary images) of sunspot groups in white light images and magnetically active regions in magnetograms. Then, we calculate numerical summaries from the sketches that summarize the morphology of magnetic flux distribution that are relevant to a sunspot group's classification and can therefore be used for statistical learning. As these numerical summaries are based on the Mt. Wilson classification rules, they have a scientific basis and are interpretable to a solar physicist. In this way, we reduce complex images to real-valued numerical feature vectors that summarize the morphological characteristics of sunspot groups and associated active regions. Our general methodology is illustrated and summarized in a schematic form through Figs 3 5.

Stenning et al.: Morphological Feature Extraction for Statistical Learning

335

(b)

(c)

(a)

(d)

Fig. 4 Extracting numerical summaries of active regions. (a) The simple active region representation obtained through the process demonstrated in Fig. 3. (b) The separating boundary between regions of opposite magnetic polarity obtained via seeded region growing. (c) The polarity separating line obtained by removing interior and border pixels, followed by applying both a morphological opening (using a disk-shaped SE of radius one) and a morphological pruning to reduce jaggedness. (d) The simple active region representation in (a) after putting separate convex hulls around opposite polarity active region pixels.
(a) (b) (c) (d)

Fig. 5 Identifying delta spots. (a) The original white light image. (b) The inverted white light image after applying a morphological opening operation using a spherical SE with radius one. (c) The pixels belonging to the sunspot group, with nonzero pixel values assigned by point-wise multiplication of the binary image obtained by thresholding image (b) and the smoothed white light image. (d) A simple representation of the umbra and penumbra regions obtained by thresholding on only the nonzero pixels from image (c). This is used in conjunction with image (j) from Fig. 3 to determine if there are umbrae with opposite polarity within a single enclosed penumbra, which is then identified as a delta spot.

3.2.1. Sunspot and active region identification In the first part of our procedure, we use the white light images to identify sunspots. This provides us with a general location of the active regions in magnetogram images, and helps distinguish the active regions from plage network. To do this, we first clean (i.e., smooth) the inverted white light image (the image obtained by multiplying each pixel value by -1) with a morphological opening operation using a spherical SE with radius five. We are using a round SE because of the circular appearance of sunspots and active regions, and the radius was chosen so that small structures will be filtered out and larger round structures (i.e., the sunspots) will be smoothed. As at this point in our procedure we are only concerned with identifying the general location of sunspots in the white light image, we are not concerned with possible destruction of features. Next,

a thresholding operation is applied by setting pixels with values above x + 4s to one and the rest to zero, where x and s are, respectively, the mean and sample standard deviation of all the pixel values in the image. The resulting binary image is dilated twice using a disk-shaped SE of radius one, which slightly increases the total area of pixels with value one. The location of sunspots in the white light image is now identified by the pixels in the binary image with value one, which we call the sunspot area. This process for id