## Abstract

Disparity estimation for binocular images is an important problem for many visual tasks such as 3D environment reconstruction, digital hologram, virtual reality, robot navigation, etc. Conventional approaches are based on brightness constancy assumption to establish spatial correspondences between a pair of images. However, in the presence of large illumination variation and serious noisy contamination, conventional approaches fail to generate accurate disparity maps. To have robust disparity estimation in these situations, we first propose a model - color monogenic curvature phase to describe local features of color images by embedding the monogenic curvature signal into the quaternion representation. Then a multiscale framework to estimate disparities is proposed by coupling the advantages of the color monogenic curvature phase and mutual information. Both indoor and outdoor images with large brightness variation are used in the experiments, and the results demonstrate that our approach can achieve a good performance even in the conditions of large illumination change and serious noisy contamination.

© 2012 Optical Society of America

## 1. Introduction

Disparity estimation for binocular images is an important problem for many visual tasks such as 3D environment reconstruction, digital hologram, virtual reality, robot navigation, etc. Typically, a matching cost is calculated at every pixel for all disparities under consideration. Conventional approaches usually assume constant intensities for matching image positions. Commonly used pixel-based matching costs are absolute differences, squared differences, sampling-insensitive absolute differences [1]. Window-based matching costs include the sum of absolute, squared differences and normalized cross correlations [2]. However, in the presence of illumination change, constant intensity constraint cannot hold any more and the corresponding disparity map thus contains a lot of errors. Mutual information, as an alternative of matching cost, has been used to compute visual correspondence because of its power to handle some brightness variations [3, 4]. In [5], Geiger et al. proposed a fast and efficient large-scale stereo matching approach. This work can achieve the state-of-the-art performance without the need for global optimization, however, it suffered from large illumination change.

In contrast to intensity, phase information, as an important feature of image, has the advantage of being invariant to illumination change. Different to gradient information, phase has different responses to lines and edges. It contains most significant structure information and the original image can be reconstructed based on only the phase information [6]. In [7, 8], the rotationally invariant monogenic phase model was proposed for gray images. Later on, Demarcq et al. [9] generalized it to handle color images. Unfortunately, the monogenic phase cannot yield accurate results for highly curved lines and edges. In our previous work [10], we proposed monogenic curvature phase to model curved lines and edges. Although it has been applied to compute visual correspondence with good performance [11], only gray images can be processed and multiscale information was not taken into consideration.

The main goal of this paper is to estimate robust disparities in the large illumination change and serious noise contamination environment. To this end, we first propose a model - color monogenic curvature phase to describe the features of color images by embedding the monogenic curvature signal into the quaternion framework. Then we present a multiscale method to compute the disparity map by coupling the advantages of mutual information and the color monogenic curvature phase. To illustrate the efficiency of the proposed approach, we include both indoor and outdoor images with large illumination change for the experiments. Presented experimental results demonstrate that our approach can achieve a good performance even in the conditions of large brightness variation and serious noise corruption.

## 2. Color monogenic curvature phase

#### 2.1. Monogenic curvature signal

Given a 2D gray image *f*(*x*, *y*), (*x*, *y*) ∈ *R*^{2}, the monogenic curvature signal [10] is defined as

*f*

_{1}can be obtained as

*f*

_{1}will yield the other two components

*f*

_{2}and

*f*

_{3}of the monogenic curvature signal. In the frequency domain, the second order Hilbert transform reads

*H*

_{2}= [cos2

*α*sin2

*α*]

*, where*

^{T}*α*is the polar coordinate. The other two components of the monogenic curvature signal are respectively given by where

*ℱ*

^{−1}refers to the inverse Fourier transform and

*F*

_{1}is the Fourier transformed result of

*f*

_{1}.

Convolving the monogenic curvature signal with the Poisson kernel
${h}_{p}=\frac{s}{2\pi {\left({x}^{2}+{y}^{2}+{s}^{2}\right)}^{3/2}}$ thus results in the monogenic curvature scale-space **f*** _{mc}*(

*x*,

*y*,

*s*) with

*s*being the scale parameter. The monogenic curvature scale-space performs a split of identity, from it, three independent local features, i.e. the amplitude, main orientation and monogenic curvature phase, can be simultaneously obtained as

*π*,

*π*] and

**u**(

*x*,

*y*,

*s*) = [

*f*

_{2}(

*x*,

*y*,

*s*)

*f*

_{3}(

*x*,

*y*,

*s*)]

*.*

^{T}#### 2.2. Color monogenic curvature scale-space

The monogenic curvature scale-space was designed to describe the characteristics of gray images, unfortunately, color information is not incorporated in this model. In [13, 14], quaternion was introduced to represent color images. For a color image f(x,y) in RGB color space, it can be represented by encoding three channels as a pure quaternion

where*i*,

*j*and

*k*are three imaginary units,

*f*,

_{r}*f*and

_{g}*f*indicate the red, green and blue channels of the color image. We are thus inspired to extend the monogenic curvature scale-space to the color domain by embedding it into the framework of quaternion. Similar to the color image representation, corresponding components of monogenic curvature scale-space are considered as three channels to be encoded in a pure quaternion. Therefore, the color monogenic curvature scale-space

_{b}**f**

*can be constructed as*

_{cmc}**f**

*refers to the monogenic curvature scale-space of the*

_{nmc}*n*th color channel.

The corresponding color monogenic curvature phase Φ* _{cmc}* is given by

## 3. Disparity estimation

To deal with stereo analysis in the environment of large brightness variation and noisy corruption, we propose a multiscale method by combining the advantages of mutual information and the color monogenic curvature phase. Figure 2 illustrates the structure of the proposed multiscale disparity estimation approach. Given a color image pair *I _{l}* and

*I*, the corresponding phase information Φ

_{r}*and Φ*

_{l}*can be extracted by applying the color monogenic curvature phase model. Based on Φ*

_{r}*and Φ*

_{l}*, two pyramids are correspondingly constructed by down-sampling the original phase images. At each scale*

_{r}*s*, the disparity map can be computed by using the mutual information of two phase images Φ

*and Φ*

_{l,s}*as the matching cost. From the coarsest scale, the estimated disparity map is used in the next scale for initialization, and this continues to the finest scale.*

_{r,s}At a given scale, the mutual information of the color monogenic curvature phase image pairs Φ* _{l,s}* and Φ

*can be defined as*

_{r,s}*H*(Φ

*) and*

_{l,s}*H*(Φ

*) are the Shannon entropy which can be given by*

_{r,s}*E*

_{Φ}indicates the expected value function of Φ,

*P*(Φ) is the probability of Φ, Ω

*refers to the domain over which the random variable can range and*

_{ϕ}*ϕ*is an event in this domain.

_{i}*H*(Φ

*, Φ*

_{l,s}*) indicates the joint entropy of Φ*

_{r,s}*and Φ*

_{l,s}*, it is represented in the following form*

_{r,s}*E*refers to the expectation,

*P*(Φ

*, Φ*

_{l,s}*) is the joint distribution of Φ*

_{r,s}*and Φ*

_{l,s}*. Since Eq. (13) defines the mutual information for the whole phase image, similar to [16], we approximate the whole mutual information as the sum of the pixel-wise mutual information and use it as a data cost, that is*

_{r,s}*d*refers to the disparity at the pixel

_{p}*p*.

Typically, disparity estimation can be obtained by minimizing the following energy expression

where*E*is a matching cost which works as a similarity measure and

_{data}*E*is the smooth energy which penalizes disparity differences. In this paper, we use the mutual information of the color monongeic curvature phase image as a matching cost. Based on the approximation, the pixel-wise data energy

_{smooth}*E*can be formulated as

_{data}*𝒩*(

*p*) is the neighbourhood pixels of the pixel p, and

*V*is represented as

_{pq}*λ*being a weighting parameter. The Graph-cuts expansion algorithm proposed in [17] is employed to minimize the energy function for the dense disparity computation.

## 4. Experimental results

In Section 4, we present some experimental results to demonstrate the efficiency of our proposed approach. First, we take two datasets “baby1” and “lampshade1” from the middlebury stereo benchmark [15, 18] as the test images. These two datasets respectively contain rich and less texture information. All images are captured under the conditions of three real different lighting sources and three different exposures, and illuminations are not equally changed over whole images. The images are rectified and radial distortion has been removed. In [5], the intensity-based approach has been proved to achieve the state-of-the-art performance of stereo matching, therefore, we include the estimated disparity results from the gray monogenic curvature phase based approach [11] and this approach for comparison.

Figure 3 shows estimated disparities for “baby1” by three different methods. Top row from left to right are two views of “baby1” which has the large brightness change, and the disparity ground truth. Bottom row illustrates estimated results using the intensity-based approach, gray monogenic curvature phase based approach and the proposed approach. It is shown that for two images with large illumination variation, intensity based approach fails to generate good results because of its sensitivity to the brightness change, the proposed method produces the best results, gray monogenic curvature phase based method performs slightly worse because no color information is incorporated and the multiscale implementation is not taken into consideration. The “lampshade1” dataset contains images with less texture information, which makes disparity estimation more difficult than that of the rich texture images. Figure 4 demonstrates the corresponding results from three methods. Top row contains two views of “lampshade1” with some brightness change and the ground truth disparity. Bottom row from left to right are estimated disparities from the intensity based, gray monogenic curvature phase based and the proposed approaches. Due to the low texture images, these three approaches generate not very good disparity maps, however, our approach still performs the best among them.

In order to quantitatively evaluate the performance of our approach, we use different lighting combinations with the same camera exposure as input image pairs and compute errors in unoccluded areas. Figure 5 shows disparity errors in unoccluded areas with respect to different lighting combinations for “baby1” and “lampshade1”. The horizontal axis represents the combination of lighting conditions, e.g. “1/3” means the left image is taken under lighting condition 1 and the right image is taken under lighting condition 3. In this figure, “Color phase” indicates the proposed method, “PMI” refers to the gray monogenic curvature phase based approach [11] and “Intensity” represents the intensity based approach [5]. It is shown that the larger the lighting condition difference is the larger the errors are for all these approaches, however, our approach performs the best. To test the robustness of the proposed approach, we use noise contaminated image pairs with the same lighting condition to check the estimated errors. Figure 6 demonstrates disparity errors in unoccluded areas with respect to signal to noise ratios for “baby1” and “lampshade1”. With the increase of signal to noise ratio, estimated errors are correspondingly decreased, and our approach still outperforms others.

Up to now, the experimental image pairs are captured in the indoor environment with ground truth. To investigate more about the performance of our approach in the outdoor environment, we use two different cameras arranged in a stereo vision system to capture outdoor images with strong lighting changes for the experiment. Figures 7 and 8 illustrate different views of outdoor images with large illumination variation and the estimated disparity maps using intensity-based approach [5], gray monogenic curvature phase [11] and the proposed method. It is shown that the intensity-based approach cannot generate good disparity map due to the large brightness change, the gray monogenic curvature phase performs much better than the intensity-based one, and our approach works the best.

## 5. Conclusions

This paper addresses the problem of estimating robust disparity maps in the large illumination change and noisy contamination environment. Conventional approaches are based on the brightness constancy assumption, however, they fail to generate accurate disparities in this special case. To have robust disparity estimation, we first propose a model - color monogenic curvature phase by embedding the monogenic curvature signal into the quaternion representation, this results in the generalization of the monongeic curvature phase to the color domain. Then, we propose a multiscale framework to estimate disparities by coupling the advantages of mutual information and the color monongeic curvature phase. We use both indoor and outdoor images with large illumination change in the experiments. Demonstrated results prove that our approach outperforms the intensity based and monogenic curvature phase based approaches, and it can a achieve a good performance even in the conditions of large brightness variation and noise corruption.

## Acknowledgment

This work has been supported by National Natural Science Foundation of China ( 61103071, 61105122, 61103072), Natural Science Foundation of Shanghai, China ( 11ZR1440200), Research Fund for the Doctoral Program of Higher Education of China ( 20110072120065) and the Key Basic Program of Science and Technology Commission of Shanghai Municipality of China ( 10DJ1400300).

## References and links

**1. **S. Birchfield and C. Tomasi, “A pixel dissimilarity measure that is insensitive to image sampling,” IEEE Trans. Pattern Anal. Mach. Intell. **20**, 401–406 (1998). [CrossRef]

**2. **H. Moravec, “Toward automatic visual obstacle avoidance,” in Proceedings of 5th International Joint Conference on Artificial Intelligence, (Morgan Kaufmann, 1977), pp. 584–590.

**3. **C. Fookes, M. Bennamoun, and A. Lamanna, “Improved stereo image matching using mutual information and hierarchical prior probabilities,” in Proceedings of 16th International Conference on Pattern Recognition, (IEEE, 2002), pp. 937–940.

**4. **I. Sarkar and M. Bansal, “A wavelet-based multiresolution approach to solve the stereo correspondence problem using mutual information,” IEEE Trans. Syst. Man. Cybern., B: Cybern. **37**, 1009–1014 (2007). [CrossRef]

**5. **A. Geiger, M. Roser, and R. Urtasun, “Efficient large-scale stereo matching,” in Proceedings of 10th Asian conference on Computer vision - Volume Part I, (Springer-Verlag, 2011), pp. 25–38.

**6. **A. V. Oppenheim, “The importance of phase in signals,” Proc. IEEE **69**, 529–541 (1981). [CrossRef]

**7. **M. Felsberg and G. Sommer, “The monogenic signal,” IEEE Trans. Signal Process. **49**, 3136–3144 (2001). [CrossRef]

**8. **M. Felsberg and G. Sommer, “The monogenic scale-space: a unifying approach to phase-based image processing in scale-space,” J. Math. Imaging Vision **21**, 5–26 (2004). [CrossRef]

**9. **G. Demarcq, L. Mascarilla, M. Berthier, and P. Courtellemont, “The color monogenic signal: application to color edge detection and color optical flow,” J. Math. Imaging Vision **40**, 269–284 (2011). [CrossRef]

**10. **D. Zang and G. Sommer, “Signal modeling for two-dimensional image structures,” J. Visual Commun. Image **18**, 81–99 (2007). [CrossRef]

**11. **D. Zang, J. Li, and D. Zhang, “Robust visual correspondence computation using monogenic curvature phase based mutual information,” Opt. Lett. **37**, 10–12 (2012). [CrossRef] [PubMed]

**12. **F. Brackx, B. D. Knock, and H. D. Schepper, “Generalized multidimensional hilbert transforms in clifford analysis,” Int. J. Math. Math. Sci. **2006**, 98145 (2006). [CrossRef]

**13. **S. J. Sangwine, “Fourier transforms of color images using quaternion or hypercomplex numbers,” Electron. Lett. **32**, 1979–1980 (1996). [CrossRef]

**14. **N. L. Bihan and S. J. Sangwine, “Quaternion principal component analysis of color images,” in Proceedings of IEEE International Conference on Image Processing, (IEEE, 2003), pp. 809–812.

**15. **http://vision.middlebury.edu/stereo/.

**16. **J. Kim, V. Kolmogorov, and R. Zabih, “Visual correspondence using energy minimization and mutual information,” in Proceedings of IEEE International Conference on Computer Vision, (IEEE, 2003), pp. 1033–1040.

**17. **Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell. **23**, 1222–1239 (2001). [CrossRef]

**18. **D. Scharstein and C. Pal, “Learning conditional random fields for stereo,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2007), pp. 1–8.