CamTrack Visual Face Tracking

CamTrack is a system to visually track the movements of a user's head as captured using a webcam. These movements are analysed to generate event responses which are communicated to X and GOK via an emulated mouse. GOK can then be used to navigate the user interface in a highly configurable way.

This project is one of 11 within GNOME (see SummerOfCode) supported by Google's Summer of Code program. Thank you Chris, Natalie & co. at Google, and jrb, seth, luis, clarkbw & co. at GNOME!

CamTrack is currently alpha software; portions of the system are incomplete or missing. Feature requests, complaints, bug reports and general comments are welcomed.


  • Tracks face position in three degrees of freedom
  • Interfaces directly to X as an additional pointing device
  • Controls x- and y-movement of pointer or GOK control point
  • Generates selection events in response to head motion or 'dwelling'
  • Supports ieee-1394 (firewire) webcams and standard firewire digital cameras
  • Supports video4linux capture devices (including USB webcams)

In the pipeline...

  • Easy-to-use configuration & calibration utility

  • Eye and mouth tracking, allowing control by facial orientation
  • Generalised face gesture recognition


Not much going on here at the minute, I know, but bear with me! Starting a new job and learning all about a new industry are proving pretty busy work... I'm also waiting for dropline GNOME to include GStreamer 0.10.1 (see below) and deciding on the best way forward for the system.

A reliable and well supported video capture system is a crucial factor in the usability of the system, and as such, I've been planning a library to interface with as many v4l and firewire cameras as possible. GStreamer currently only supports v4l cameras properly, and so I have been directly dealing with libdc1394 and v4l directly. However, I've been looking at the new stable 0.10.1 GStreamer and it now seems to support firewire and v4l sources as well as v4ljpeg (OV519) cameras. This makes a new capture library redundant, so it looks like CamTrack is going to become a GStreamer app (to some extent) from now on.


The system uses reference data compiled from sampled images to classify individual pixels as representative of skin or non-skin regions based on hue. A Bayesian 'Maximum Likelihood' classifier is used. This generates a map showing the skin-coloured regions in the captured frame.

The resulting map is processed to reduce noise and improve the definition of the face region.

The most significant marked region in the map is then tracked using an iterative mean-shift algorithm.

The motion of the tracking window is recorded and temporally smoothed, and the resulting values used to trigger events based on the position or velocity of the imaged face.

CamTrack provides its output by emulating a serial mouse, using a FIFO instead of a device file, so that X11, GOK or any system supporting standard mouse protocols can easily interface with the tracker.


You will need to have X11 and a compatible graphical environment installed in order to compile and use this software. In addition, the tracking program and associated utilities make use of the Imlib2 library. libraw1394 and libdc1394 are used to capture frames from firewire webcams. An image manipulation program, such as the GIMP, is currently required for preparation of template data.

If you want to track from live camera input, as opposed to a video stream stored on disk, you'll need an ieee-1394 (firewire) webcam (or industrial camera if you're lucky enough to have one), or a USB webcam or video capture card supporting video4linux.

GOK is a versatile on-screen keyboard and GUI navigation system, and usually the best way to make use of CamTrack's output.

Get CamTrack!

The software is currently available from GNOME CVS.


The README file included with the source code provides additional information, including instructions for generating reference data and configuring X11 to recognise your tracker.

Common Problems

There are two situations you should try to avoid as they will interfere with the tracking system. These issues both arise because the algorithm identifies facial regions by hue.

  • Inappropriate Lighting

The face should be evenly lit, ideally with white light, for tracking to function well. Lighting from the side, above or below the face will result in tracking errors as the system will lock onto either the lit or shadowed areas and ignore the others. Coloured lighting may confuse the algorithm, as will very bright directional light on the face or very dim lighting. An even level of ambient white light is best.

  • Confusing Background Colours

If lighting is less than optimal, the tracker may be oversensitive to flesh-like colours in the background. It's not a good idea to try to use the system whilst sitting in front of a beige wall with a pink uplight, for example. However, if the face is well lit, this should rarely be a problem.


Q: Any recommendations for webcams that work better with CamTrack, or are they about the same? Or is it too early to tell?

A: Well, my D-link DRF-350C (basically a cheap & cheerful alternative to an iSight) works really well, but it's out of production and firewire cams are quite hard to come by now unless you want to shell out for an iSight. A similar one to keep an eye out for is the Unibrain Fire-i. Don't pay more than £25 or so though, since you should be able to get your hands on an iSight for not much more than that. I've used a Sony DFW-V500 and they're great if you want to spend the best part of £1000!

The above are all firewire cams, and can stream uncompressed RGB24 at high speeds. USB cams (even USB 2.0 ones, as far as I know) will not currently work as well since they compress the frames lossily and discard some of the colour information that the tracking system uses. However, since my firewire hub decided to die a couple of weeks ago I've been using an Omnivision OV519-based USB webcam quite successfully. This chip (used in many cheap USB webcams, as well as the EyeToy) compresses each frame as a low-quality jpeg, resulting in pretty rough image quality. The tracker seems to cope pretty well with this, but a bit more care in producing the reference histograms is usually necessary for good performance. I'm working on making the system use motion information to generate its own reference data at startup, and adapt to changes in lighting, updating its histograms on-the-fly, but this is still in its very early stages.

The tracker currently requires that camera drivers can produce RGB24 frames. It'll select the highest resolution that supports this format and use that. Much more configuration and support for a much wider variety of hardware are what I'd like to add to the software at the minute, since it currently supports a minimal set of format/resolution combinations.

If you're finding that the face regions aren't being recognised very well, try building reference histograms from a couple of images captured just before you want to use the camera. If that doesn't improve performance, I might need you to capture a series of frames and send them to me for debugging.

Q: So it's similar to Head-Tracking Pointer from IBM?

A: Yes, it is quite similar in function, although the underlying implementation is probably very different. I think IBM's system is based on tracking image features under affine transformations (the weak perspective approximation). Although in the long run I'd like to incorporate my own feature tracking subsystem into CamTrack where appropriate, I'm trying to take it as far as I can using only colour analysis for now.

Also, CamTrack is designed to interface neatly with GNOME's accessibility tools such as GOK, allowing it to function as part of a versatile and configurable keyboard/pointer alternative.

Q: Why don't you use Intel's OpenCV library?

A: I thought until recently that OpenCV required another of Intel's libraries, the Integrated Performance Primatives (IPP), which would cause licencing problems. I've since been corrected. In any case, it's a bit on the heavy side for something aimed at being a small(ish) component of the UI to have to install OpenCV, especially since one of the reasons for this project is so that people wanting to use a perceptual interface don't have to get their hands on an expensive commercial product. Maybe if the CV algorithms get more intensive, I'll look again.

Q: Why do you use Imlib2 and not <...> image manipulation library?

A: Imlib2 is fast, and it does my colourspace conversions for me.

Q: I mean this quite seriously: what if you're black? Indian, or on the dark side of southeast asian, even. Do you plan to develop special modules for different profiles, or will the algorithms be refined in time to, say, make and average of the hue differences? -- Auk

A: It depends... There 2 parts to my answer for this one, so bear with me!

1) The system works (in theory) by separating hue from the other two colour components (saturation and lightness/value, in this model). This means that the detection of skin by hue is largely insensitive to colour saturation. Now, the colour in your skin arises because of the presence of the pigment melanin. This same pigment is always responsible for skin colour so your skin is always the same hue no matter what your skin tone is like - the only variation is in the concentration of melanin in your skin, which corresponds to changes in saturation. So in fact, I can confidently say your skin is the same hue as mine, just more or less saturated. Try taking some photos of a number of people with different skin tones and extracting just the hue component - you'll find that everyone's skin colour has pretty similar hue values.

However, in practice this doesn't always work so well. One culprit is the fact that most digital image formats garble the colour information slightly, and another is that my HSV colour representation isn't so accurate at most lighting levels, due to a kind of conical mapping from the YUV or RGB data I'm working with. A third is that the HSV model itself isn't quite orthogonal in practice - for reasons I haven't really figured out yet, the hue channel doesn't seem to be completely independent of saturation for most images. The final big problem is with two special cases: firstly, true albinos have no melanin, so the hue itself will be very different. Secondly, if your skin tone is dark enough that the RGB value is close to 0-0-0, there's no data to calculate a hue from. Both are serious problems. I'm currently tinkering with alternative algorithms to try and address these.

2) You train the system using captured images of your own face, using your own camera. Usually only 3 frames or so are required to give reasonable performance. This requirement really springs from the fact that different cameras garble colour information in different ways, but an added bonus means it's tailored just for your skin hue. Having said that, once it's trained for a particular camera it should track almost anybody.

As an example, a former research colleague of mine has a pretty dark Indian skin tone, and it tracked her easily, even though it was trained only with images of my (very pale) face.


If you have a question that should be answered above, or a comment on the system or this page it can go here, in 'Comments?'


Attic/CamTrack (last edited 2013-11-22 21:57:37 by WilliamJonMcCann)