Visual Attention in Virtual Reality

Kat
Oct 23, 2018
7 min read

Updated: Oct 30, 2018

head mounted virtual reality display with eyetracker

When designing interfaces that use visual attention, we must remember that our brains do not shoot and store perfect photos of the world around us like cameras - instead they process and filter our visual world to help us take the right actions to survive. When optimizing the human mind for survival, less is more. Human brains have certain physical size and processing limitations. They take mental shortcuts to speedily determine if someone is friend or foe, if something seems edible, if food is nearby. These mental shortcuts, called heuristics, work most of the time but sometimes they fail.

We’ve all experienced “looking but not seeing.” Sometimes the consequences are harmless, even funny - for example, accidentally walking into something while thinking about something else, or reading the same line of a book over and over again when tired. Other times they can be fatal - like a passenger’s conversation distracting a driver from noticing a pedestrian crossing the street.

left frame shows human attention has one spotlight of attention. right frame 360 camera has multiple spotlights of attention.

The above blunders are the natural result of a balancing act, of an evolutionary optimization of tradeoffs in attention to maximize performance on survival tasks. For example, our fovea gives us good resolution at the center of our gaze, but our optic nerve does not have the capacity to allow the same resolution across the entire gaze to reach the brain for processing. (Compare this to a self-driving car with 360 degree cameras that view equally well from all angles.) Essentially our bodies have made a tradeoff in resolution - really good resolution within the center of our gaze and worse resolution on the periphery. For this reason, our brains have learned which areas around us need our immediate gaze, and then to shift gaze when the most strategic location changes. Within this strategic framework, any so-called mistakes - such as not noticing someone waving at you from the side while you are focusing on shaping a clay pot with your hands - can be considered necessary sacrifices in attention to less important goals. It leaves you open to notice relevant changes in the clay pot, and to fix anything that is going wrong in a timely manner. The less you notice outside distractions, the better the clay pot. So, complete, representative information about the world around you is not necessary for you to take proper action to complete your goal.

Exploring the details of these visual attention tradeoffs helps us test our design assumptions. For example, we may assume we need to attract gaze (“eyeballs”) to an object to get people to notice it and want to buy it. We may have originally attracted eyeballs by making all important objects bright, big or animated - like websites inundated with pop-up ads. This quickly gets overwhelming and doesn’t necessarily lead to true visual processing of the ads. In fact, people can learn to ignore distracting animated ads after a lot of experience with them. Even worse, like in the clay pot example, trying to attract their attention to outside objects (ads) can make their actual job harder, so this solution is a loss for everyone.

Once thinking about the “looking but not seeing phenomenon” makes us question our original design goal, we may choose a better design goal: getting important objects or alerts into visual memory. Gaze isn’t enough to cause people to take action on important alerts or objects. Our brains need to process these objects somehow, which involves getting the objects into visual memory. So, it’s essential to understand the difference between attracting gaze and visual memory. My graduate lab studied the positive correlation between visual memory for objects and task-relevance of these objects. (In plain English, we’re more likely to remember objects relevant to what we’re doing over anything else, even other objects we see every day.) Designers have learned to harness this phenomenon by bringing objects into a narrative because it’s effective at creating memories. An example of this is when TV show writers incorporate a certain brand of car into the plot line - people are more likely to remember the brand of car if it had something to do with the plot than if it was simply visible in the background of the shot for a very long time.

left frame puts spotlight of attention on a bandit running away. right frame puts spotlight of attention on bandit in car driving across chasm. The car is inside the narrative on the right, but not the left.

To alter our routine visual interface design assumptions, we ask some very basic questions: What are we consistently good at noticing? When and why do we fail to process important visual information? Once we understand that, how do we make sure we actually notice critically important objects and actions? So, we recreate the visual tradeoffs to force the attention “mistakes” in laboratory settings to understand why they happen, when they are most likely to happen, and when the consequences are most dramatic.

My graduate school research lab looked at visual behavior in familiar places. People spend most of their time in a small number of locations - at work, at home, and possibly at their own local hangout (for example, a coffeeshop, church, library, or park). We hypothesized that visual behavior would differ between familiar and unfamiliar environments, because of the expectations we build up inside familiar environments. For example, you should be more likely to notice a difference if an object changes size, location or color (while your back is turned) in a familiar environment than in a new one. In a new one, you’re so busy taking in and learning the environment that you would probably not notice any changes during that time. Even if you did, you would simply think you misremembered them in the first place. You haven’t built up enough expectations of that environment for confidence in what is normal. In contrast, you’d be likely to notice whether a home object you’re familiar with became a different color while you were out of the room. However, you may be less likely to notice if your keys have moved from one of their usual resting places to another (say, from a countertop to a desktop) while your back is turned, because that change actually fits within your built up expectations. Good design could use this to its advantage.

To reach the above findings, my lab used 3D walkable virtual reality environments in natural settings, such as an apartment or a city block, to study people’s visual attention while doing natural activities such as picking things up. Our research goals were to:

show that changes to task-irrelevant objects can attract gaze (Back to the clay pot example: although visual changes irrelevant to making the pot should be harder to notice, they should not be completely impossible.)
show that more experience in a place makes changes more likely to attract gaze (the more you understand what your office should look like, the more you notice when it doesn’t fit that expectation)
show that off-task glances don’t lead to memory of objects (the looking but not seeing phenomenon - if it’s not important to you, your gaze alone won’t make you notice or remember it)

We familiarized people with environments such as an outdoor city scene where they walked multiple laps around the block, and an indoor 3-room apartment where they located and reached out their hand to “touch” household objects. We measured how the familiarity with an environment affected visual attention on both new objects and familiar objects. Familiarity could be the total amount of time spent in a household environment or, alternately, the number of times someone touched an object. We measured visual attention by recording visual fixations: short glances at a point in space that last around 100-300 milliseconds. This can show you how quickly someone looks at a changed object, how long they look at it, and how likely they are to look at it in the first place. Intuitively, attraction of gaze to a task-irrelevant object shows our brain detecting an unexpected change.

diagram of the experimental design. a new object appears on lap 2 or 16. person fixates the object on lap 16, but not 2

We found that amount of time in an environment affected how likely someone was to look at a new object.

diagram of the experimental design. Image caption reads "please touch the coffeemaker." in the left frame, before the color change, attention is on the coffeemaker. After the whisk changes color in the right frame, attention is on the whisk.

We found that off-task changes, such as a change in the color of an object, can attract gaze. We also found that people learned about these objects via tasks rather than off-task exploration.

a person throws a crumpled piece of paper named "things they looked at" in the garbage. Meanwhile, a thought bubble above their head reads "things they touched"

The best visual interfaces for embodied tasks will be designed to guide action rather than to show the most detailed view of the world. Choosing the correct action is paramount - information display should be in service of that goal. Our visual system only needs partially complete information to choose actions important to survival, because our brain is good at filling in the blanks. In fact, showing detailed information often distracts rather than aids. People sometimes task designers to make interfaces as detailed, aesthetic, or comprehensive as possible, even when that does not improve user behaviors. When testing interface alternatives, the design goal should be to maximize how often the human and interface together take the correct action, rather than how often the interface provides the correct information.

Let’s think about an example using car dashboards: in an ideal scenario we could circumvent the human and utilize computer vision algorithms to determine whether the car should brake for something in the road. In that case, the interface only needs to alert a human when something ambiguous appears that needs a human’s expert guidance. This rather strong way of guiding actions sidesteps the visual attention problem, and it prioritizes actions taken over information displayed. As another example, we can display a “possible danger” alert more often in visually distracting situations where humans are more likely to fail than in visually simple situations. This, too, shows a prioritization of attention at times when humans usually choose the wrong action, rather than giving equal priority every time cameras detect a possible hazard. Many current visual interfaces err on the side of alerting for danger too frequently rather than not often enough, because the damage caused by running into a hazard outweighs the damage caused by displaying an alert where there is no hazard. However, given enough false positives, people will start ignoring alerts, possibly leading to inaction at a critical moment.

We have a duty to alert people to all possible hazards but we also have a duty to maximize successful error avoidance. Successful visual interface designs will prioritize the second, while attempting to also achieve the first.

Kat Snyder | UX Research | Palo Alto, CA

Visual Attention in Virtual Reality

Recent Posts

Comentarios