Summary: Applying Mid-level Vision Techniques for Video Data Compression and
John Y. A. Wangy
, Edward H. Adelsonz
, and Ujjaval Desaiy
yDepartment of Electrical Engineering and Computer Science
zDepartment of Brain and Cognitive Sciences
The MIT Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139
M.I.T. Media Laboratory Vision and Modeling Group, Technical Report No. 263, February 1994.
Appears in Proceedings of the SPIE: Digital Video Compression on Personal Computers: Algorithms and Technologies,
vol. 2187, San Jose, February 1994.
Most image coding systems rely on signal processing concepts such as transforms, VQ, and motion compensation. In
order to achieve significantly lower bit rates, it will be necessary to devise encoding schemes that involve mid-level and
high-level computer vision. Model-based systems have been described, but these are usually restricted to some special class
of images such as head-and-shoulders sequences. We propose to use mid-level vision concepts to achieve a decomposition
that can be applied to a wider domain of image material. In particular, we describe a coding scheme based on a set of
overlapping layers. The layers, which are ordered in depth and move over one another, are composited in a manner similar to
traditional "cel" animation. The decomposition (the vision problem) is challenging, but we have attained promising results
on simple sequences. Once the decomposition has been achieved, the synthesis is straightforward.