This is an experimental Haskell system for fast prototyping of computer vision and image processing applications.

We take advantage of Haskell's expressive power without any performance loss, since most of the low level heavy numerical computations are done by specialized libraries: 

Requirements

Related work 

There is an earlier approach by the Yale Haskell Group.

Download

  • A tutorial will be available soon.

Disclaimer: this software is currently extremely provisional and unstable. It may not work "out of the box" and I do not recommend it for serious applications.

Installation

The system has been tested only on ubuntu, but it should work on any linux distribution. To use the modules which do not require IPP (Classifier and Vision) you should only include -i/pathto/easyVision/lib in the ghc(i) command line (see the examples folder). The programs requiring the IPP (modules ImagProc and EasyVision) can be easily compiled using make. See (and modify as desired) the Makefile and the example programs in the compvis folder. You may need to export LD_LIBRARY_PATH = pathto/ipp/../sharedlib.

Image acquisition

image acquisition is done by MPlayer, so you can try the examples below with your own camera, webcam or any kind of video file. We communicate with MPlayer through a standard unix fifo.

Some testing videos are available here. Note that:

  • Many of them are in raw dv format, which is expensive for decodification.
  • In some cases a cheap TV card with external video input may be a good alternative for live video capture from domestic dv cameras.

Examples

simple image processing

The application demo.hs illustrates several typical image processing algorithms, that can be chosen from a menu. The interface to the IPP functions is aware of image ROIs (regions of interest), that usually can be selected with the mouse:


Canny's edge detector


distance transform


DCT

 
contour affine normalization

pose estimation

On-line estimation of camera position from live video (a webcam) given the view of an A4 sheet of paper:

compvis$ ./pose tv:// --focal 2.6

The 3D view can be changed with the mouse as a simulated trackball. Note that the camera model shows the live video in the image plane. You can also try the video contours/frontal.dv:

compvis$ ./pose frontal.dv --size 12

Once we estimate camera position we can add virtual objects to the scene:

compvis$ ./augmented tv://

This post shows a nice demo with "dynamical" virtual objects.

We detect long straight line segments (using a binding to the segment extractor developed by my friend Pedro E. López-de-Teruel), then we use graph algorithms in the standard Haskell libraries to find closed 4-side polygons, apply a geometric consistency check, and finally we optimize the camera parameters by minimization of the reprojection error. 

See also the ARToolKit

stereo reconstruction

we first check that our algorithms work with synthetic data:

examples$ runhaskell -i../lib stereo.hs

 Then we can try to do the same with real images taken from two webcams simultaneously connected to the computer. The following program computes interest points, finds tentative correspondences, applies RANSAC to remove outliers, obtains the Fundamental matrix, selfcalibrates the cameras under the assumption of internal parameters diag f f 1 with common f, and rectifies the images to get correspondences in the same row:

compvis$ ./autostereo tv:// 'tv:// -tv device=/dev/video1'

The next step would be dense disparity estimation and 3D reconstruction, but a lot of things must first be improved (better feature descriptors and initial correspondences, etc.).

planar metric rectification

This was the first "serious" application of the system: automatic rectification of a planar scene from several perspective views (Ruiz et al., BMVC 06).

Currently the correspondences are manually selected by the user. We are working on an fully automatic, real time version of this application.

Note: The solution to this problem involves a moderately complicated optimization process in which several intermediate stages may give meaningless results. I switched to Haskell after some unfortunate attempts to solve it with other language. Then I started the easyVision project since I wanted to try the proposed method with real image sequences. On August 2006 I finally decided to use Haskell for all my projects.

camera combinators

Using Haskell we can easily define camera combinators :: IO Image -> IO (IO Image), i.e., functions which admit cameras and produce "virtual" cameras: each call (grab) returns an image which depends on the infinite (lazy) sequence of images generated by the input camera.

Several virtual cameras are combined to get any desired effect. For instance, the example program interpolate.hs obtains the following result on the video contours/big_plate.dv:

compvis$ ./interpolate big_plate.dv

This effect is achieved by the following composition:

(cam,ctrl) <- getCam 0 size
              >>= monitorizeIn "original" (Size 150 200) id
              >>= asFloat
              >>= drift alpha
              >>= interpolate
              >>= withPause

There is a post in the blog with more information about this.

pattern classification combinators

The easyVision system includes a few pattern classification methods. Here are the results of some of them on an illustrative toy problem (click to get a bigger picture). 

examples$ runhaskell -i../lib classdemo.hs

Using higher order functions we can easily define complex machines by combining pattern recognition (meta) algorithms. For instance, the bottom-right solution is obtained by an adaboost of decision trees of perceptrons (!) We also provide feature extraction combinators. For instance, in a real application we recognize different of objects in selected frames of a video sequence by a combination of color, texture, and eigenspace-based features:

color = const $ percentilsYUV [0.25,0.5,0.75]
texture = const highpassAverage
appearance = const $ small' (Size 20 20)

We can directly define a classifier using some of these features:

machine = distance gaussian `onP`andP [color,appearance]

Or better, we can define a classifier which combines the outputs of other classifiers:

machine = distance nearestNeighbour `onP` features

features = andP [clColor, clAppearance]

clColor = outputOf (distance ordinary)
           `ofP` normalizeMinMax (0,1)
             `ofP`andP [color, texture]

clAppearance = outputOf (multiclass mse)
    `ofP` outputOf (distance (subspace (ReconstructionQuality 0.5)))
       `ofP`
appearance

Combination of feature extractors is similar to ordinary function composition, but taking into account as a first argument a sample of the problem [(object, label)] which is used to build (learn) the feature (object1->object2) or the classifier (object->label).

Future plans

  • Define more image processing functions (currently IO) as pure functions with reasonable values outside the valid ROI.
  • Improve the organization of Classifier.Base.
  • Better control of MPlayer (currently we only launch the process).
  • Improve the data types in the projective geometry modules.
  • SIFT and MSER
Our research group is simultaneously working on a similar system written in C++/QT.