Документ взят из кэша поисковой машины. Адрес оригинального документа : http://imaging.cs.msu.ru/pub/Superresolution08.pdf
Дата изменения: Thu May 22 23:00:00 2008
Дата индексирования: Mon Oct 1 19:40:05 2012
Кодировка:

Face image super-resolution from video data with non-uniform illumination
Andrey S. Krylov, Andrey V. Nasonov, Dmitry V. Sorokin Faculty of Computational Mathematics and Cybernetics Moscow Lomonosov State University, Moscow, Russia kryl@cs.msu.su, nasonov@cs.msu.ru, sorokin_dm@bk.ru

Abstract
Tikhonov regularization approach and block motion model ar e used to solve super-resolution problem for face video data. Video is preprocessed by 2-D empirical mode decomposition method to suppress illumination artifacts for super-resolution. Keywords: face super-resolution, video, EMD.

turbulence effect which is often neglected, Fk is a warping operator like motion blur or image shift for k-th image, H
ca m

is is In is rm

camera lens blur which is usually modeled as Gauss filter, D the downsampling operator, n is a noise, usually Gaussian. many cases, only translation model is considered and noise ignored, so Fk can be merged with H ca m and the transfo operator is simplified as Ak z DH k z , where H Gauss filter with the kernel
k
2

is a shifted

1. INTRODUCTION
The problem of super-reso image from a set of several problem is very helpful for biometrics, etc. because quality. lution is to recover a high-resolution degraded low-resolution images. This face detection in human surveillance, it can significantly improve image

H k ( x, y)

1

2

exp(

( x xk ) 2 ( y yk ) 2
2

),

(2)

where xk and yk are shifts of high-resolution image relatively to k-th low-resolution image along x and y axis respectively. There are different methods to solve (1). The most widely-used methods are [9]: iterated error back-projection which minimizes the error functional using error upsampling and subtraction from high-resolution image [10], [11], stochastic reconstruction methods [12], projections onto convex set [13], [14], Tikhonov regularization [8] and single-pass filtering which approximates the solution of (1) [15], [16]. Linear translation model is usually insufficient for superresolution problem, because the motion is non -linear. Different motion models are used [4], [15]. It is computational ineffective to calculate the motion for every pixel. The motion of adjacent pixels is usually similar, so, the motion of only several pixels is calculated. The motion of other pixels is interpolated. The simplest model is regular motion field [ 4]. For large images, it is effective to calculate the motion of pixels which belong to edges and corners [15].

Face super-resolution algorithms can be divided into two groups: learning-based and reconstruction-based. Learning-based algorithms collect the information about correspondence between low- and high-resolution images and use the gathered information for resolution enhancement. These methods are not actually super-resolution methods, because they operate with a single image. They do not reconstruct missed data, they only predict it using learning database. Several input images do not significantly improves the resolution and only help to reduce the probability of using incorrect information from the database. The most popular method is Baker method [1], [2] which decomposes the image into a Laplacian pyramid and predicts its values for high-resolution image. Patch-based methods are popular too. They divide low- and high-resolution images into a set of pairs of fixed size rectangles called patches and substitutes the most appropriate patches into high-resolution image. They vary by learning and substitution methods, for example, neural networks [3], locality preserving projections [4], asymmetric associative learning [5], locally linear embedding [6], etc. Principal component analysis is also used for learning-based super-resolution [7]. Reconstruction-based algorithms use only low-resolution images to construct high-resolution image. Most reconstruction-based algorithms use camera models [8] for downsampling the highresolution image. The problem is formulated as error minimization problem

2. OUR APPROACH
We consider the task of face image super-resolution from video data. We use Tikhonov regularization approach [8] and block motion model. The reason is that the problem (1) is illconditioned or either ill-posed. We use l1 norm z
1

i, j

|z

i, j

|

instead of standard Euclidian norm, because it has shown bett er results.

z arg min
zZ

k

Ak z vk ,
k

(1) is k-th lowWe

z arg min
zZ

k

Ak z vk 1 f ( z ) .

(3) functional

where z is unknown high-resolution image, v

resolution image, Ak is an operator which transforms highresolution image into low-resolution. Various norms are used. The operator can be generally represented as Ak z DH ca mFk H a tmz n , where H a tm is atmosphere

TV ( z )

i, j

choose

total

| zi

1, j

zi, j |

i, j

variation

| zi

, j 1

zi, j | and bilateral TV
x, y

functional [i] BT V ( z )

p x p, p y p

| x| | y |

S

zz

1

as a stabilizer

f ( z) , S

x, y

is a shift operator along horizontal and vertical axis

for x and y pixels respectively, 0.8 , p 1 or 2 . TV ( z ) can be also represented as TV ( z ) S1,0 z z S
1 0,1

zz .
1

g (u )

i, j

J (u )

i, j

All the images are considered as the results of motion of the first image. Both first and target images are convolved with Gauss filter to suppress noise. For motion estimation, we calculate the motion on a regular grid G with a step within 8 to 16 pixels range. For every point from the grid G , we take a small square block (816 pixels width) from the first image centered in this point. Then we find the optimal shift of this block in target image with pixel accuracy using least mean square approach. To calculate the motion with subpixel accuracy, we convolve the first image with shifted Gauss filter (2). The motion for other pixels is interpolated linearly (for example, using bilinear or Gauss filter). Then a set of matrices T points Ti
(k ) ,j
(k )

{1}, {1}, [1, 1],

u u u

i, j i, j i, j

0, 0, 0, in this case we
i, j

assume g (u )

0.

Thus J (u) sign u

with sign function applied per each

element of u . The subgradient of F ( z ) can be written in the form

g
H
*

( n)

k

H *T

( k )*

sign(T

(k )

Hz vk ) f ( z ) .

and T

( k )*

are standard conjugate operators defined by

o f 2 -D

Euclidian

scalar
( k )* ( k )

( xi , y j ) is constructed. The k-th matrix represents
from the k-th image and the element of these matrices by e matrices represent the low-resolution images and
(k )

z

(k )

T

u

H* H . is constructed in the following way: first, z ( k )
product. For Gauss filter
(k )

the correspondence between pixels first image. Next we multiply every resampling scale factor, so th correspondence of pixels between high-resolution image.

is zero-filled, then for every pixel (i, j ) from u coordinates in z
(k )

we obtain its
(k ) i, j

: ( xi , y j )

Ti( k ) ,j

and add value u

to

z

(k ) xi , y

j

. For non-integer coordinates, we add the value to the

In this case, transform operator Ak looks as Ak T

nearest pixels with coefficients obtained by bilinear interpolation.

H , where

For the case of f ( z ) BTV ( z ) , the subgradient looks as

H is zero-mean Gauss filter and u
compensated downsampling operator:

(k )

T
(k ) ,j

(k )

z is motion-

f ( z )

ui(,kj) z
If xi or y
j

xi , y

j

, where ( xi , y j ) Ti

p x, y p

| x| | y |

(S

x, y

I )sign( S

x, y

z z) ,

. is approximated

is not integer value, then z

xi , y

j

where I is unit operator. For f ( z ) TV ( z ) the subgradient is calculated the same way. The coefficients
n

using bilinear interpolation. Note: it is better to perform shifted Gauss filter to calculate z xi , y j more precisely, but it would be very slow. Gauss filter H reduces high-band frequencies, so bilinear approximation is enough.

in (4) satisfy the condition for step lengths

n g

( n) 1

sn , where step lengths sn are chosen a priori in the

form sn s0 q n , 0 q 1 . We use s0 50 and choose q to obtain s
N 1

3. NUMERICAL METHOD
We use iterative subgradient method with non-constant step [17] for fast minimization of (3). The iterations look like

0.1 for the last iteration.

The application of the proposed super-resolution method is shown in Figure 1. For sequent video data this method shows better results than any single image resampling method.

z
where

( n 1)

z
(n)

( n)

n g

( n)

,

(4)

g

( n)

F ( z ) |

z

is any subgradient of the object

functional

F ( z)
Vector g
(n)

k

T

(k )

Hz v

k

1

f ( z ) .
(n)

is an element of subgradient set F ( z ) |

z

of

F ( z)

at
( n)

z

( n)

if
( n)

it
( n)

satisfies

the

condition Only o ne

F ( z) F ( z

) (g
( n)

,z z

)

for

all

z.

subgradient exists and it is equal to normal gradient if F ( z ) is differentiable at z .
1

The subgradient of J (u ) u

for the grid points is

Huang et al. defined IMF as function that satisfies two conditions: a) the number of extrema equals the number of zero-crossing or differs at most by one; b) at any point, the mean value of upper envelope defined by local maxima and lower envelope defined by local minima is zero. Let f (t ) be the signal to be decomposed. Using this definition we can describe EMD algorithm as follows: 1. 2. a) 3. 4. 5. Identify all local extrema of f (t ) . Interpolate all local maxima to get upper-envelope emax (t ) and all local minima to get lower-envelope emin (t ) . Compute the local mean m(t )

emax (t ) emin (t ) . 2

Compute d (t ) f (t ) m(t ) . d (t ) is the candidate to be an IMF. If d (t ) satisfies the definition of IMF, subtract it from the signal r (t ) f (t ) d (t ) and go to step 6. If d (t ) does not satisfy the definition of IMF, go to step 1

b)

c) 6.

and use d (t ) instead of f (t ) . Steps 1-5 are repeated until

d (t ) satisfies the definition of IMF.
If residue r (t ) is a monotone function, the decomposition process is complete. If residue r (t ) is not a monotone function, go to step 1 and use r (t ) instead of f (t ) . The process of getting each IMF (steps 1-4) is called sifting process. When the decomposition is complete we can write f (t ) as follows:

d)

e)

f (t )

Figure 1: Face super-resolution for the factor of 4 and 10 input images. a) source low-resolution images; b, c, d) single image interpolation using b) nearest neighbor; c) bilinear interpolation; d) regularization-based method [26]; e) proposed super-resolution result.

k 1

N

k

(t ) r (t ) ,

where k (t ) is the k-th IMF and r (t ) is the residue. There are several crucial points in the algorithm: the interpolation method for upper- and lower-envelopes calculation, boundary processing method, the stopping criterion and number of iterations in sifting process. Huang et al. uses cubic spline interpolation to estimate the upperand lower-envelopes [18]. Other methods for estimation are also used: B-splines [19], an optimization process based method [20], etc. Several methods to process boundary points for interpolation of the envelopes were suggested. One of the ways to solve this problem is to consider the end points of the signal as the maximum and the minimum at the same time. Another way is to extend the signal, make envelopes for extended signal and then use only its original definition domain part [ 21]. In practice it is very difficult to get the physical meanfull IMF function that is strongly satisfies the definition. So different sifting process stopping criteria were introduced. Often the size of standard deviation SD computed from two consecutive sifting results [18, 22] is used as the criterion:

4. EMD-BASED REMOVAL

ILLUMINATION

ARTIFACT

The initial super-resolution video data suffers from the illumination artifacts. To overcome this problem we use Empirical Mode Decomposition (EMD) method. EMD is a multisolution decomposition technique which was first introduced by Huang et al. in [18]. This method is appropriate for non-linear, non-stationary signal analysis. The concept of EMD is to decompose the signal into a set of zero-mean functions called Intrinsic Mode Functions (IMF) and a residue. As the increasing of decomposition level, the complexion (frequency) of IMF decreases. In comparison to other time-frequency analysis tools such as Fourier analysis or wavelet analysis, EMD is fully datadriven i.e. there are no pre-determined basis functions. At first we describe the algorithm for 1-D signals.

SD

t 0

T

d

k 1

(t ) d k (t )
2 k 1

2

d

(t )

.

The typical used value of SD is between 0.2 and 0.3. The limiting of the local mean value of sifting result m(t ) in each point is also used [23]. The number of iterations in sifting process can be restricted [22, 24]. 2-D case of EMD is still an open problem but it has the same crucial points: extrema points locating process, the interpolation method for upper- and lower-envelopes estimation, boundary processing method, the stopping criterion and number of iterations in sifting process. In our approach we locate local maxima and minima as follows: f (i, j ) is local maxima if f (i, j ) f (k , l ) , where a) b) c)

d) e) f) Figure 3: EMD example. a) original image; b) 1-st IMF; c) 2-nd IMF; d) 3-rd IMF; e) 4-th IMF; f) residue.

i 1 k i 1, j 1 l j 1 ,

f (i, j ) is local minima if

f (i, j ) f (k , l ) where i 1 k i 1, j 1 l j 1 . We
use Delaunay triangulation-based linear interpolation to estimate envelopes and even extension for boundary processing. This even extension for boundary processing is illustrated in Figure 2.

EMD method can be used for illumination artifact removal. The idea to remove illumination artifacts from image is based on the decomposition of the initial image using EMD

f (i, j )

k 1

N

k

(i, j ) r (i, j ) .

Illumination

artifacts

are

considered as low frequency information which can be eliminated from the image. We obtain the enhanced image using several first IMFs

f (i, j )

k 1

M

k

(i, j ) , where M N which are the

highest frequency components [25]. In [25] the authors use 1-D EMD for illumination correction representing the image as 1-D signal. In our approach we use more effective 2-D EMD (see a result in Figure 4).

a)

b)

Figure 2: Boundary processing -- a) original image; b) extended image for envelope construction. As a stopping criterion we use the limitation conjunction with restriction of the number of it process. An example of EMD applied to a face is shown in Figure 3. The histogram of the images was adjusted to illustrate the behavior of of local mean in erations in sifting image from video IMF and residue these functions.

a)

b)

c)

d)

Figure 4: Illumination artifact removal -- a,b) original images; c,d) processed images.

5. RESULTS
The results of super-resolution method depend drastically on the taken face video data. Serious enhancement of the tracked face by super-resolution method with EMD algorithm for illumination correction is typically obtained. To illustrate the general effect of the EMD enhancement we used a set of non-sequent images with artificially degraded illumination. This set is not typical for practical video data where the illumination change is continuous, but even in this case the EMD based result is reasonable (see Figure 5). Our tests show that the single image regularization resampling method [26] with EMD enhancement for the case of non-sequent images gives better result than the above superresolution method.

[2] S. Baker, T. Kanade Limits on Super-Resolution and How to Break Them // IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 9, Sep 2002, pp. 11671183. [3] Manuel Carcenac A modular neural network for superresolution of human faces // Applied Intelligence http://www.springerlink.com/content/n7228127462q2457/ to be published. [4] Sung Won Park, Marios Savvides Breaking the limitation of manifold analysis for super-resolution of facial images // IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1, April 2007, pp. 573 576. [5] Wei Liu, Dahua Lin, Xiaoou Tang Face Hallucination Through Dual Associative Learning // IEEE International Conference on Image Processing, Vol. 1, Sept. 2005, pp. 873 876. [6] Hong Chang, Dit-Yan Yeung, Yimin Xiong SuperResolution Through Neighbor Embedding // Proceedings of the Computer Vision and Pattern Recognition, Vol. 1, 2004, pp. 275 282. [7] Wei Geng, Yunhong Wang Aging Simulation of Face Images Based on Super-Resolution // Communications in Computer and Information Science, Vol. 2, Part 21, 2007, pp. 930939.

a)

b)

[8] S. Farsiu, D. Robinson, M. Elad, P. Milanfar Fast and Robust Multi-Frame Super-Resolution // IEEE Trans. On Image Processing, Vol. 13, No. 10, pp. 1327 -1344, October 2004. [9] S. Borman, Robert L. Stevenson Super-Resolution from Image Sequences -- A Review // Midwest Symposium on Circuits and Systems, 1998, pp. 374378. [10] Jianguo Lu, Anni Cai, Fei Su A New Algorithm for Extracting High-Resolution Face Image from Video Sequence // International Conference on Computational Intelligence and Security, Vol. 2, Nov. 2006, pp. 16891694.

c)

d)

Figure 5: An application of illumination artifact removal for super-resolution. a) original images; b) super-resolution result; c) EMD processed images; d) super-resolution result for EMD processed images.

[11] Jiangang Yu, Bir Bhanu Super-resolution Restoration of Facial Images in Video // 18th International Conference on Pattern Recognition, Vol. 4, 2006, pp. 342 345. [12] R. R. Schultz, R. L. Stevenson Extraction of highresolution frames from video sequences // IEEE Transactions on Image Processing, Vol. 5, No. 6, June 1996, pp. 996 1011. [13] A. J. Patti, M. I. Sezan, A. M. Tekalp Superresolution video reconstruction with arbitrary sampling lattices and nonzero aperture time // IEEE Transactions on Image Processing, Vol. 6, No. 8, Aug. 1997, pp. 10641076. [14] P. E. Eren, M. I. Sezan, A. Tekalp Robust, Object-Based High-Resolution Image Reconstruction from Low-Resolution Video // IEEE Transactions on Image Procession, Vol. 6, No. 10, 1997, pp. 14461451. [15] Ha V. Le, Method Based Proceedings of Computational Guna Seetharaman A Super-Resolution Imaging on Dense Subpixel-Accurate Motion Fields // the Third International Workshop on Digital and Video, Nov. 2002, pp. 35 42.

6. CONCLUSION
Super-resolution method based on Tikhonov regularization approach and block motion model for face video data has been proposed. The approach was found promising to be used in real applications. The performance of the method has been improved by 2-D empirical mode decomposition method application to suppress illumination artifacts of video. The research on use of 2-D intrinsic mode functions inside super-resolution algorithm is under work. This research was supported by RFBR grants 06-01-39006_ and 06-01-00789-a.

7. REFERENCES
[1] S. Baker, T. Kanade Hallucinating faces // In IEEE International Conference on Automatic Face and Gesture Recognition, March 2000.

[16] F. Lin, J. Cook, V. Chandran, S. Sridharan Face recognition from super-resolved images // Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, Vol. 2, 2005, pp. 667 670. [17] S. Boyd, L. Xiao, A. Mutapcic Subgradient methods // Lecture notes of EE392o, Stanford University, 2003.

[18] N. E. Huang, Z. Shen, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and nonstationary time series analysis // Royal Society of London Proceedings Series A, Vol. 454, Issue 1971, 1998, pp. 9031005. [19] Q. Chen, N. Huang, S. Riemenschneider, Y. Xu A B-spline approach for empirical mode decompositions // Advances in Computational Mathematics, Vol. 24, 2006, pp. 171 195. [20] Yoshikazu Washizawa, Toshihisa Tanaka, Danilo P. Mandic, Andrzej Cichocki A Flexible Method for Envelope Estimation in Empirical Mode Decomposition // Lecture Notes in Computer Science, Vol. 4253, 2006, pp. 1248 1255. [21] Kan Zeng, Ming-Xia technique for empirical International Proceedings Symposium, Vol. 6, Sept. 20 He A simple boundary process mode decomposition // IEEE of Geoscience and Remote Sensing 04, pp. 4258 4261.

[22] Liu Wei, Xu Weidong, Li Lihua Medical Image Retrieval Based on Bidimensional Empirical Mode Decomposition // Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, Oct. 2007, pp 641 646. [23] A. Linderhed 2D empirical mode decompositions in the spirit of image compression // Proceedings of the SPIE on Wavelet and Independent Component Analysis Applications IX, Vol. 4738, pp. 18. [24] Christophe Damerval, Sylvain Meignen, ValИrie Perrier A Fast Algorithm for Bidimensional EMD // IEEE Signal Processing Letters, Vol. 12, No. 10, Oct. 2005, pp. 701 704. [25] R. Bhagavatula, M. Savvides Analyzing Facial Images using Empirical Mode Decomposition for Illumination Artifact Removal and Improved Face Recognition // Processing of IEEE International Conference on Acoustics, Speech and Signal, Vol. 1 April 2007, pp. 505508. [26] Alexey Lukin, Andrey S. Krylov, Andrey Nasonov Image Interpolation by Super-Resolution // Graphicon 2006 conference proceedings, Novosibirsk, Russia (2006), pp. 239 242. http://imaging.cs.msu.ru/software/

About authors
Andrey S. Krylov is an associated professor, head of the Laboratory of Mathematical Methods of Image Processing, Faculty of Computational Mathematics and Cybernetics, Moscow Lomonosov State University. Email: kryl@cs.msu.ru Andrey V. Nasonov is a member of scientific staff of the Faculty of Computational Mathematics and Cybernetics, Moscow Lomonosov State University. Email: nasonov@cs.msu.ru Dmitry V. Sorokin is a student of the Faculty of Computational Mathematics and Cybernetics, Moscow Lomonosov State University. Email: sorokin_dm@bk.ru