This is an experimental Haskell system for fast prototyping of computer vision and image processing applications.
We take advantage of Haskell's expressive power without any performance loss, since most of the low level heavy numerical computations are done by specialized libraries:
Requirements
- hmatrix (new version of GSLHaskell) for matrix computations.
- HopenGL for 2D and 3D graphics and user interface.
- MPlayer for real time image grabbing and video decoding.
- Intel's IPP for fast image processing.
- Other useful software: segment extraction, SVMLight, etc.
Related work
There is an earlier approach by the Yale Haskell Group.
Download
- Documentation
- Source code: darcs get http://perception.inf.um.es/~aruiz/darcs/easyVision
- Browse the repository
- I have started a blog about this project.
- A tutorial will be available soon.
Disclaimer: this software is currently extremely provisional and unstable. It may not work "out of the box" and I do not recommend it for serious applications.
Installation
The system has been tested only on ubuntu, but it should work on any linux distribution. To use the modules which do not require IPP (Classifier and Vision) you should only include -i/pathto/easyVision/lib in the ghc(i) command line (see the examples folder). The programs requiring the IPP (modules ImagProc and EasyVision) can be easily compiled using make. See (and modify as desired) the Makefile and the example programs in the compvis folder. You may need to export LD_LIBRARY_PATH = pathto/ipp/../sharedlib.
Image acquisition
image acquisition is done by MPlayer, so you can try the examples below with your own camera, webcam or any kind of video file. We communicate with MPlayer through a standard unix fifo.
Some testing videos are available here. Note that:
- Many of them are in raw dv format, which is expensive for decodification.
- In some cases a cheap TV card with external video input may be a good alternative for live video capture from domestic dv cameras.
Examples
simple image processing
The application demo.hs illustrates several typical image processing algorithms, that can be chosen from a menu. The interface to the IPP functions is aware of image ROIs (regions of interest), that usually can be selected with the mouse:
pose estimation
On-line estimation of camera position from live video (a webcam) given the view of an A4 sheet of paper:
compvis$ ./pose tv:// --focal 2.6
The 3D view can be changed with the mouse as a simulated trackball. Note that the camera model shows the live video in the image plane. You can also try the video contours/frontal.dv:
compvis$ ./pose frontal.dv --size 12
Once we estimate camera position we can add virtual objects to the scene:
compvis$ ./augmented tv://
This post shows a nice demo with "dynamical" virtual objects.
We detect long straight line segments (using a binding to the segment extractor developed by my friend Pedro E. López-de-Teruel), then we use graph algorithms in the standard Haskell libraries to find closed 4-side polygons, apply a geometric consistency check, and finally we optimize the camera parameters by minimization of the reprojection error.
See also the ARToolKit.
stereo reconstruction
we first check that our algorithms work with synthetic data:
examples$ runhaskell -i../lib stereo.hs
Then we can try to do the same with real images taken from two webcams simultaneously connected to the computer. The following program computes interest points, finds tentative correspondences, applies RANSAC to remove outliers, obtains the Fundamental matrix, selfcalibrates the cameras under the assumption of internal parameters diag f f 1 with common f, and rectifies the images to get correspondences in the same row:
compvis$ ./autostereo tv:// 'tv:// -tv device=/dev/video1'
The next step would be dense disparity estimation and 3D reconstruction, but a lot of things must first be improved (better feature descriptors and initial correspondences, etc.).
planar metric rectification
This was the first
"serious" application of the system: automatic rectification of a
planar scene from several perspective views (Ruiz et al., BMVC 06).
Currently the correspondences are manually selected by the user. We are working on an fully automatic, real time version of this application.
Note:
The solution to this problem involves a moderately complicated
optimization process in which several intermediate stages may give
meaningless results. I switched to Haskell after some unfortunate
attempts to solve it with other language. Then I started the
easyVision project since I wanted to try the proposed method with real image sequences. On August 2006 I finally decided to use Haskell for all my projects.
camera combinators
Using Haskell we can easily define camera combinators :: IO Image -> IO (IO Image), i.e., functions which admit cameras and produce "virtual" cameras: each call (grab) returns an image which depends on the infinite (lazy) sequence of images generated by the input camera.
Several virtual cameras are combined to get any desired effect. For instance, the example program interpolate.hs obtains the following result on the video contours/big_plate.dv:
compvis$ ./interpolate big_plate.dv
This effect is achieved by the following composition:
(cam,ctrl) <- getCam 0 size
>>= monitorizeIn "original" (Size 150 200) id
>>= asFloat
>>= drift alpha
>>= interpolate
>>= withPause
pattern classification combinators
The easyVision system includes a few pattern classification methods. Here are the results of some of them on an illustrative toy problem (click to get a bigger picture).
examples$ runhaskell -i../lib classdemo.hs
Using higher order functions we can easily define complex machines by combining pattern recognition (meta) algorithms. For instance, the bottom-right solution is obtained by an adaboost of decision trees of perceptrons (!) We also provide feature extraction combinators. For instance, in a real application we recognize different of objects in selected frames of a video sequence by a combination of color, texture, and eigenspace-based features:
color = const $ percentilsYUV [0.25,0.5,0.75]
texture = const highpassAverage
appearance = const $ small' (Size 20 20)
We can directly define a classifier using some of these features:
machine = distance gaussian `onP`andP [color,appearance]
Or better, we can define a classifier which combines the outputs of other classifiers:
machine = distance nearestNeighbour `onP` features
features = andP [clColor, clAppearance]
clColor = outputOf (distance ordinary)
`ofP` normalizeMinMax (0,1)
`ofP`andP [color, texture]
clAppearance = outputOf (multiclass mse)
`ofP` outputOf (distance (subspace (ReconstructionQuality 0.5)))
`ofP` appearance
Combination of feature extractors is similar to ordinary function composition, but taking into account as a first argument a sample of the problem [(object, label)] which is used to build (learn) the feature (object1->object2) or the classifier (object->label).
Future plans
- Define more image processing functions (currently IO) as pure functions with reasonable values outside the valid ROI.
- Improve the organization of Classifier.Base.
- Better control of MPlayer (currently we only launch the process).
- Improve the data types in the projective geometry modules.
- SIFT and MSER










