Presence - Gaze Detection Version 1

Grad school project
installation, machine learning
Python, Fabrication, Hardware

"Presence" is a kinetic sculpture that uses computer vision to watch how a person stands or moves, and then changes its own shape to match the person's body. It was on display at the 2018 Maker Faire event.

View post on Instagram

Our eyes are windows into our interests and intentions. This piece explores new ways to communicate with art using these indicators. The viewer's perception becomes part of a feedback loop, as when it changes, the shape of the installation responds, which in turn alters the viewer's perception.

Why eye-gaze tracking?

Traditionally, our interface with computers has been limited to our hands, using controls such as a mouse, a touch screen or a keyboard. As stated in the research paper Eye Tracking in Advanced Interface Design:

Current technology has been stronger in the computer-to-user direction than user-to-computer, hence today’s user-computer dialogues are rather one-sided, with the bandwidth from the computer to the user far greater than that from user to computer. Using eye movements as a user-to-computer communication medium can help redress this imbalance.

While Presence explores an artistic form of using eyes as a communication medium, the ability to use our gaze to control an interface opens up a wide array of possibilites, such as:

  • Enabling someone who is physically disabled to use their eyes instead of a mouse
  • Being able to look at the bottom of an article or text and have the screen scroll down automatically
  • When having multiple tabs in an editor open, allowing a user to switch tabs and focus using their eyes
  • Measuring focus and interest

How does it work?

Until recently, being able to track gaze has been both expensive and cumbersome, as it requires either specialized, pricey hardware that needs to be calibrated for each user, or specific types of cameras that are often unreliable. Researchers at MIT and University of Georgia proved it can be done with any webcam in their research paper called Eye Tracking for Everyone.

In this paper they trained a neural-network on about 2.4 million frames of video of people looking at a dot on a screen on different devices. This neural-network was able to achieve up to around 2 cm accuracy when predicting gaze.

Presence uses the pre-trained model provided by the researchers to track gaze in real-time. It detects faces and eyes use OpenCV, then forwards these detection through the neural-network model using Caffe, which then outputs the gaze positions in centimeters relative to the camera. A Processing sketch reads these positions, generates an animation, and converts the animation into servo position byte instructions which are sent over serial to a servo controller.


The installation consists of 21 Futaba S3004 servos controlled by a Mini-Maestro 24-Channel USB Servo Controller, which is connected over usb to a PC with a Nvidia gtx970 gpu. A basic webcam is connected over usb to the PC. A 5V 8A power supply powers the servos.

The servos are mounted to a piece of laser-cut acrylic, which is attached to a CNC-routed Plywood frame. White shippings tubes with electrical-tape spirals are mounted to each motor using CNC-routed end-caps attached to mounting hubs. Steel screws which are fastened to the top of the frame provide horizontal support to the tubes and allow them to rotate freely.