Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (2024)

Associated Data Abstract Executive Summary Introduction Scope, Approach and Intended Audience of Paper The Common View: Reading an Inner Emotional State of Mind From A Set of Unique Facial Movements Table 1. A Systematic Approach for Evaluating the Scientific Evidence Table 2: The Null Hypothesis and the Role of Context A Focus on Six Emotion Categories: Anger, Disgust, Fear, Happiness, Sadness and Surprise The Anatomy of a Typical Experiment Designed to Observe People’s Facial Movements During Episodes of Emotion Table 3: Studies of Healthy Adults from the U.S. and Other Developed Nations Table 4: Studies of Healthy Adults Living in Small-Scale, Remote Cultures Table 5: Studies of Healthy Infants and Children Table 6: Studies of Congenitally Blind Individuals Summary of Scientific Evidence on the Production of Facial Expressions Perceiving Emotions from Facial Movements: A Review of the Scientific Evidence The Anatomy of a Typical Experiment Designed to Observe Whether People Reliably and Specifically Infer Emotion in Facial Movements Studies of Healthy Adults From the U.S. and Other Developed Nations Studies of Healthy Adults Living in Small-Scale, Remote Cultures Table 7: Studies of Healthy Infants and Children Summary of Scientific Evidence on the Perception of Emotion in Faces Summary and Recommendations Evaluation of the Empirical Evidence A Note on the Scientific Literature A Note on Other Emotion Categories Recommendations for Consumers of Emotion Research on Applying the Scientific Findings Table 8: Recommendations for Future Scientific Research Table 9: Supplementary Material Acknowledgements Glossary Biography Footnotes References

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (1)

Psychol Sci Public Interest. 2019 Dec;20(1):1-68.

Associated Data

Supplementary Materials


It is commonly assumed that a person’s emotional state can be readily inferred from the person’s facial movements, typically called “emotional expressions” or “facial expressions.” This assumption influences legal judgments, policy decisions, national security protocols, and educational practices, guides the diagnosis and treatment of psychiatric illness, as well as the development of commercial applications, and pervades everyday social interactions as well as research in other scientific fields such as artificial intelligence, neuroscience, and computer vision. In this paper, we survey examples of this widespread assumption, which we refer to as the “common view”, and then examine the scientific evidence for this view with a focus on the six most popular emotion categories used by consumers of emotion research: anger, disgust, fear, happiness, sadness and surprise. The available scientific evidence suggests that people do sometimes smile when happy, frown when sad, scowl when angry, and so on, more than what would be expected by chance. Yet there is substantial variation in how people communicate anger, disgust, fear, happiness, sadness and surprise, across cultures, situations, and even within a single situation. Furthermore, similar configurations of facial movements variably express instances of more than one emotion category. In fact, a given configuration of facial movements, such as a scowl, often communicates something other than an emotional state. Scientists agree that facial movements convey a range of social information and are important for social communication, emotional or otherwise. But our review suggests there is an urgent need for research that examines how people actually move their faces to express emotions and other social information in the variety of contexts that make up everyday life, as well as careful study of the mechanisms by which people perceive instances of emotion in one another. We make specific research recommendations that will yield a more valid picture of how people move their faces to express emotions, and how they infer emotional meaning from facial movements, as situations of everyday life. This research is crucial to provide consumers of emotion research with the translational information they require.

Executive Summary

It is commonly assumed that a person’s face gives evidence of emotions because there is a reliable mapping between a certain configuration of facial movements, called a “facial expression,” and the specific emotional state that it supposedly signals. This common view of facial expressions remains entrenched in consumers of emotion research, as well as in some scientists, despite an emerging consensus among affective scientists that emotional expressions are considerably more context-dependent and variable. Nonetheless, this common view continues to fuel commercial applications in industry and government (e.g., automated detection of emotions from faces), guide how children are taught (e.g., with posters and books showing stereotyped facial expressions), and impact clinical and legal applications (e.g., diagnoses of psychiatric illnesses and courtroom decisions). In this paper, we evaluate the common view of facial expressions against a review of the evidence and conclude that it rests on a number of flawed assumptions and incorrect interpretations of research findings. Our review is the most comprehensive and systematic to date, encompassing studies of healthy adults across cultures, newborns and young children, as well as people who are congenitally blind, and confirms that specific emotion categories -- anger, disgust, fear, happiness, sadness, and surprise – are each expressed with a particular configuration of facial movements, more reliably than would be expected by mere chance, but contrary to the common view, instances of these emotion categories are NOT expressed with facial movements that are sufficiently reliable and specific across contexts, individuals, and cultures to be considered diagnostic displays of any emotional state. Nor do human perceivers, in fact, infer emotions from particular configurations of muscle movements in a sufficiently reliable and specific way that similarly generalizes. Studies of expression production and perception both demonstrate multiple sources of variability that contradict the common view that smiles, scowls, frowns, and the like, are reliable and specific “expressions of emotion.” We conclude the paper with specific recommendations for both scientists and consumers of science.


Faces are a ubiquitous part of everyday life for humans. We greet each other with smiles or nods. We have face-to-face conversations on a daily basis, whether in person or via computers. We capture faces with smartphones and tablets, exchanging photos of ourselves and of each other on Instagram, Snapchat, and other social media platforms. The ability to perceive faces is one of the first capacities to emerge after birth: an infant begins to perceive faces within the first few days of life, equipped with a preference for face-like arrangements that allows the brain to wire itself, with experience, to become expert at perceiving faces (Arcaro et al., 2017; Cassia et al., 2004; Grossmann, 2015; Ghandi et al., 2017; Smith et al., 2018; Turati, 2004; but see for a more qualified claim).1 Faces offer a rich, salient source of information for navigating the social world: they play a role in deciding who to love, who to trust, who to help, and who is found guilty of a crime (Todorov, 2017; Zebrowitz, 1997, 2017; Zhang, Chen & Yang, 2018). Dating back to the ancient Greeks (Aristotle, in 4th century BCE) and Romans (Cicero), various cultures have viewed the human face as a window on the mind. But to what extent can a raised eyebrow, a curled lip, or a narrowed eye reveal what someone is thinking or feeling, allowing a perceiver’s brain to guess what that someone will do next?2 The answers to these questions have major consequences for human outcomes as they unfold in the living room, the classroom, the courtroom and even on the battlefield. They also powerfully shape the direction of research in a broad array of scientific fields, from basic neuroscience to psychiatry research.

Understanding what facial movements might reveal about a person’s emotions is made more urgent by the fact that many people believe we already know. Specific configurations of facial muscle movements appear as if they summarily broadcast or display a person’s emotions, which is why they are routinely referred to as “emotional expressions” and “facial expressions.”3 A simple Google search using the phrase “emotional facial expressions” [see Box 1, in supplementary on-line materials (SOM)] reveals the ubiquity with which, at least in certain parts of the world, people believe that certain emotion categories are reliably signaled or revealed by certain facial muscle movement configurations – a set of beliefs were refer to as the common view (also called the classical view; Barrett, 2017a). Similarly, many cultural products testify to the common view. Here are several examples:

  • Technology companies are investing tremendous resources to figure out how to objectively “read” emotions in people by detecting their presumed facial expressions, such as scowling faces, frowning faces and smiling faces in an automated fashion. Several companies claim to have already done it (e.g., https://www.affectiva.com/what/products/; https://azure.microsoft.com/en-us/services/cognitive-services/emotion/). For example, Microsoft’s Emotion API promises to take video images of a person’s face to detect what that individual is feeling. The application states: “The emotions detected are anger, contempt, disgust, fear, happiness, neutral, sadness, and surprise. These emotions are understood to be cross-culturally and universally communicated with particular facial expressions”(https://azure.microsoft.com/en-us/services/cognitive-services/emotion/).

  • Countless electronic messages are annotated with emojis or emoticons that are schematized versions of the proposed facial expressions for various emotion categories (https://www.apple.com/newsroom/2018/07/apple-celebrates-world-emoji-day/).

  • Putative emotional expressions are taught to preschool children by displaying scowling faces, frowning faces, smiling faces and so on, in posters (e.g., use “feeling chart for children” in a Google image search), games (https://www.amazon.com/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=miniland+emotion) and books (e.g., Cain, 2000; Parr, 2005), and on episodes of Sesame Street (among many examples, see https://www.youtube.com/watch?v=ZxfJicfyCdg, https://vimeo.com/108524970, or https://www.youtube.com/watch?v=y28GH2GoIvc).4

  • Television shows (e.g., Lie to Me), movies (e.g., Inside Out) and documentaries (e.g., The Human Face, produced by the British Broadcasting Company) customarily depict certain facial configurations as universal expressions of emotions.

  • Magazine and newspaper articles routinely feature stories in kind: facial configurations depicting a scowl are referred to as “expressions of anger,” facial configurations depicting a smile are referred to as “expressions of happiness,” facial configurations depicting a frown are referred to as “expressions of sadness,” and so on.

  • Agents of the U.S. Federal Bureau of Investigations (FBI) and the Transportation Security Administration (TSA) were trained to detect emotions and other intentions using these facial configurations, with the goal of identifying and thwarting terrorists (Rhonda Heilig, special agent with the FBI, personal communication, December 15, 2014, 11:20 am; https://how-emotions-are-made.com/notes/Screening_of_Passengers_by_Observation_Techniques).5

  • The facial configurations that supposedly diagnose emotional states also figure prominently in the diagnosis and treatment of psychiatric disorders. One of the most widely used task in autism research, the “Reading the Mind in the Eyes Test”, asks patients to match photos of the upper (eye) region of a posed facial configuration with specific mental state words, including emotion words (Baron-Cohen t al., 2001). Treatment plans for people living with autism and other brain disorders often include learning to recognize these facial configurations as emotional expressions (Baron-Cohen et al., 2004; ). This training does not generalize well to real-world skills, however (Bergren et al., 2018; ).

  • “Reading” the emotions of a defendant (in the words of Supreme Court Justice Anthony Kennedy -- to “know the heart and mind of the offender”) is one pillar of a fair trial in the U.S. legal system and in many legal systems in the Western world (see Riggins v. Nevada, 1992). Legal actors like jurors and judges routinely rely on facial movements to determine the guilt and remorse of a defendant (e.g., Bandes, 2014; Zebrowitz, 1997). For example, defendants who are perceived as untrustworthy receive harsher sentences than they otherwise would (, 2016), and such perceptions are more likely when a person appears to be angry (i.e., facial structure is similar to the hypothesized facial expression of anger, which is a scowl (Todorov, 2017). An incorrect inference about a defendant’s emotional state can cost someone her children, her freedom, or even her life (for recent examples, see Barrett, 2017, beginning on page 183).

But can a person’s emotional state be reasonably inferred from that person’s facial movements? In this paper, we offer a systematic review of the evidence, testing the common view that instances of emotion are signaled with a distinctive configuration of facial movements with enough consistently that it can serve as a diagnostic marker of those instances. We focus our review on evidence pertaining to six emotion categories that have received the lion’s share of attention in the scientific literature -- anger, disgust, fear, happiness, sadness and surprise – and that, correspondingly, are the focus of common view (as evidenced by our Google search, summarized in Box 1, SOM), but our conclusions apply to all emotion categories that have thus far been scientifically studied. We open the paper with a brief discussion of its scope, approach, and intended audience. We then summarize evidence on how people actually move their faces during episodes of emotion, referred to as studies of expression production studies, following which we examine evidence for which emotions are actually inferred from looking at facial movements, referred to as studies of emotion perception. We identify three key shortcomings in the scientific research that have contributed to a general misunderstanding about how emotions are expressed and perceived in facial movements, and that limit the translation of this scientific evidence for other uses:

  1. limited reliability (instances of the same emotion category are neither reliably expressed with or perceived from a common set of facial movements);

  2. lack of specificity (there is no unique mapping between a single configuration of facial movements and instances of the same emotion category); and,

  3. limited generalizability (the effects of context and culture have not been sufficiently documented and accounted for).

We then discuss our conclusions, followed by proposals for consumers on how they might use the existing scientific literature. We also provide recommendations for future research with consumers of emotion research in mind. We have included additional detail on some topics of import or interest in the supplementary on-line materials (SOM).

Scope, Approach and Intended Audience of Paper

The Common View: Reading an Inner Emotional State of Mind From A Set of Unique Facial Movements

In common English parlance, people refer to “emotions” or “an emotion” as if anger, happiness, or any emotion word refers to an object that is highly similar on every occurrence. But an emotion word refers not to a unitary entity, but to a category of instances that vary from one another in their physical features, such as facial expressions and bodily changes, and mental features. Few scientists who study emotion, if any, take the view that every instance of an emotion category, such as anger, is identical to every other instance, sharing a set of necessary and sufficient features across situations, people and cultures. For example, Keltner and Cordaro (2017) recently wrote, “there is no one-to-one correspondence between a specific set of facial muscle actions or vocal cues and any and every experience of emotion” (p. 62). Yet there is considerable scientific debate about the amount of the within-category variation, the specific features that vary, the causes of the within-category variation, and implications of this variation for the nature of emotion (see Figure 1).

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (2)

Explanatory frameworks guiding the science of emotion: The nature of emotion categories and their concepts.

Figure is plotted along two dimensions. Horizontal: represents hypotheses about the surface similarities shared by instances of the same emotion category (e.g., the facial movements that express instances of the same emotion category). Vertical: represents hypotheses about the deep similarities in the mechanisms that cause instances of the same emotion category (e.g., to what extent do instances in the same category share deep, causal features?). Colors represent the type of emotion categories that are proposed in each theoretical framework (green = ad hoc, abstract categories; yellow = prototype or theory-based categories; red = natural kind categories).

One popular scientific framework, referred to as the basic emotion approach, hypothesizes that instances of an emotion category are expressed with facial movements that vary, to some degree, around a typical set of movements (called a prototype) (for example, see Table 1). For example, it is hypothesized that in one instance, anger might be expressed with the expressive prototype (e.g., brows furrowed, eyes wide, lips tightened) plus additional facial movements, such as a widened mouth, whereas on other occasions, a facial movement in the prototype might be missing (e.g., anger might be expressed with narrowed eyes or without movement in the eyebrow region; for a discussion, see Box 2, in SOM). Nonetheless, the basic emotion approach still assumes that the core facial configuration – the prototype -- can be used to diagnose a person’s inner emotional state in much the same way that a fingerprint can be used to uniquely recognize a person. More substantial variation in expressions (e.g., smiling in anger, gasping with widened eyes in anger, and scowling not in anger, but in confusion or concentration) is typically explained as the result of some process that is independent of an emotion itself, such as display rules, emotion regulation strategies such as suppressing the expression, or culture-specific dialects (as proposed by various scientists, including Elfenbein, 2013, 2017; ; Matsumoto, 1990; ; ).

Table 1.

A comparison of the facial configurations listed as the expressions of selected emotion categories

Proposed Expressive Configurations Described as Facial Action Units
Emotion CategoryMatsumoto, Keltner, Shiota, O’Sullivan &
Frank (2008)
Cordaro, Sun, Keltner, Kamble,
Huddar & McNeil (2017)
Keltner et al. (in press)Physical Description
Darnin’s (1872) DescriptionObserved in reseearchReference Configuration UsedInternational Core Pattern
Amusem*ntNot listedNot listed6, 12, 26 or 27, 55 or 56, a “head bounce” ()6, 7, 12, 16, 25, 26 or 27, 536+7+12+25+26+53Head back, duch*enne smile (6, 7, 12), lips separated, jaw dropped
Anger4+ 5+ 24+ 384 + 5 or 7 +22+23+244 +5 + 7 + 23 ()4, 74+5+17+23+24Brows furrowed, eyes wide, lips tightened and pressed together
AweNot listedNot listed1, 5, 26 or 27, 57 and visible inhalation (Shiota et al., 2003)1, 2, 5, 12, 25, 26 or 27, 53Not listed
Contempt9+ 10+ 22+ 41+ 61 or 6212 (unilateral) + 14 (unilateral)12 + 14 (Ekman et al., 1983)4, 14, 25Not listed
Disgust10+ 16+ 22+ 25 or 269 or 10, 25 or 269+15+16 (Ekman et al., 1983)4, 6, 7, 9, 10, 25, 26 or 277+9+19+25+26Eyes narrowed, nose wrinkled, lips parted, jaw dropped, tongue show
EmbarrassmentNot listedNot listed12, 24, 51, 54, 64 (Keltner & Buswell, 1997)6, 7, 12, 25, 54, participant dampens smile with 23, 24, frown, etc.)7+12+15+52+54+64Eyelids narrowed, controlled smile, head turned and down, (not scored with FACS: hand touches face)
Fear1+2+5+201+2+4+5+20, 25 or 261+2+4+5+7+20+26 (Ekman et al., 1983)1, 2, 5, 7, 25, 26 or 27, participant suddenly shifts entire body backwards in chair1+2+4+5+7+20+25Eyebrows raised and pulled together, upper eyelid raised, lower eyelid tense, lips parted and stretched
Happiness6+126 + 126+12 (Ekman et al., 1983)6, 7, 12, 16, 25, 26 or 276+7+12+25+26duch*enne smile (6, 7, 12)
PrideNot listedNot listed6, 12, 24, 53, a straightening of the back and pulling back of the shoulders to expose the chest (Shiota et al., 2003)7, 12, 53, participant sits up straight53+64Head up, eyes down
Sadness1 + 151+15, 4,171+4+5 (Ekman et al., 1983)4, 43, 541+4+6+15+17Brows knitted, eyes slightly tightened, lip corners depressed, lower lip raised
ShameNot listedNot listed54, 64 (Keltner & Buswell, 1997)4, 17, 5454+64Head down, eyes down
Surprise1+ 2 + 5+ 25 or 261+2+5+25 or 261+2+5+26 (Ekman et al., 1983)1, 2, 5, 25, 26 or 271+2+5+25+26Eyebrows raised, upper eyelid raised, lips parted, jaw dropped

Note. Darwin’s description taken from Matsumoto et al. (2008), Table 13.1. International core patterns (ICPs) refer to expressions of 22 emotion categories that are thought to be conserved across cultures, taken from Cordaro et al. (2017), Tables 4, 5 and 6. A plus sign means “with”; these action units would appear simultaneously. A comma means “sometimes with”; these action units are statistically the most probable to appear, but do not necessarily need to happen simultaneously (David Cordaro, personal communication, 11/11/2018).

By contrast, other scientific frameworks propose that expressions of the same emotion category, such as anger, substantially vary by design, in a way that is tied to the immediate context, which includes the internal context (e.g., the person’s metabolic condition, the past experiences that come to mind, etc.) and the outward context (e.g., whether a person is at work, at school, or at home, who else is present the broader cultural conditions, etc.), both of which vary in dynamic ways over time (see Box 2, SOM). These debates, while useful to scientists, provide little clear guidance for consumers of emotion research who are focused on the practical issue of whether various emotion categories are expressed with facial configurations of sufficient regularity and distinctiveness so that it is possible to read emotion in a person’s face.

The common view of emotional expressions persist, too, because scientists’ actions often don’t follow their claims in a transparent, straightforward way. Many scientists continue to design experiments, use stimuli and publish review papers that, ironically, leave readers with the impression that certain emotion categories each have a single, unique facial expression, even as those same scientists acknowledge that every emotion category can be expressed with a variable set of facial movements. Published studies typically test the hypothesis that there are unique emotion-expression links (for examples, see the reference lists in ; ; Keltner, Sauter, Tracy & Cowen, in press; also see most of the studies reviewed in this paper, e.g., ). The exact facial configuration tested varies slightly from study to study, but a core facial configuration is still assumed (see Table 1 for examples). This pattern of testing the hypothesis that instances of one emotion category are expressed with a single core facial configuration reinforces (perhaps unintentionally) the common view that each emotion category is consistently and uniquely expressed with its own distinctive configuration of facial movements. Review articles (again, perhaps unintentionally) reinforce the impression of unique face-emotion mappings by including tables and figures that display a single, unique facial configuration for each emotion category, referred to as the expression, signal or display for that emotion (Figure 2 presents two recent examples).6 Consumers of this research then assume that a distinctive configuration can be used to diagnose the presence of the corresponding emotion (e.g., that a scowl indicates the presence of anger).

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (3)
Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (4)

Example figures from recently published papers that reinforce the common belief in diagnostic facial expressions of emotion.

A. Adapted from Cordaro et al. (in press), Table 1, with permission. Face photos © Dr. Lenny Kristal. B. , Figure 2, with permission.

The common view of emotional expressions has also been imported into other scientific disciplines with an interest in understanding emotions, such as neuroscience and artificial intelligence (AI). For example, from a published paper on AI:

“American psychologist Ekman noticed that some facial expressions corresponding to certain emotions are common for all the people independently of their gender, race, education, ethnicity, etc. He proposed the discrete emotional model using six universal emotions: happiness, surprise, anger, disgust, sadness and fear.” (, p. 1, italics in the original)

Similar examples come from our own papers. One paper series of papers focused on the brain structures involved in perceiving emotions from facial configurations (Adolphs, 2002; Adolphs et al., 1994) and the other focused on early life experiences (Pollak et al., 2000; ). These papers were framed in terms of “recognizing facial expressions of emotion” and exclusively presented participants with specific, posed photographs of scowling faces (the presumed facial expression for anger), wide-eyed gasping faces (the presumed facial expression for fear), and so on. Participants were shown faces of different individuals all posing the same facial configuration for each emotion category, ignoring the importance of context. One reason for this flawed approach to investigating the perception of emotion from faces was that then -- at the time these studies were conducted – as now, published experiments, review articles, and stimulus sets were dominated by the common view that certain emotion categories were signaled with an invariant set of facial configurations, referred to as “facial expressions of basic emotions.”

In this paper, we review the scientific evidence that directly tests two beliefs that form the common view of emotional expressions: that certain emotion categories are each routinely expressed by a unique facial configuration and, correspondingly, that people can reliably infer someone else’s emotional state from a set of facial movements. Our discussion is written for consumers of emotion research, whether they be scientists in other fields or non-scientists, who need not have deep knowledge of the various theories, debates, and broad range of findings in the science of emotion, with sufficient pointers to those discussions if they are of interest (see Box 2, SOM).

In discussing what this paper is about – the common view that a person’s inner emotional state is revealed in facial movements -- it bears mentioning what this paper is not about: This paper is not a referendum on “basic emotion” view we briefly mentioned earlier in this section, proposed by the psychologist Paul Ekman and his colleagues, or any other research program or psychologist’s view. Ekman’s theoretical approach has been highly influential in research on emotion for much of the past 50 years. We often cite studies inspired by the basic emotion approach for this reason. In addition, the common view of emotional expressions is also most readily associated with a simplified version of basic emotion approach, as exemplified by the quotes above. Critiques of Ekman’s basic emotion view (and related views) are numerous (e.g., Barrett, 2006a, 2007, 2011; ; Russell, 1991, 1994, 1995), as are rejoinders that defend it (e.g., Ekman, 1992, 1994; Izard, 2007). Our paper steps back from this dialogue. We instead take as our focus the existing research on emotional expression and emotion perception and ask whether it is sufficiently strong to justify the way it is increasingly being used by those who consume it.

A Systematic Approach for Evaluating the Scientific Evidence

When you see someone smile and infer that the person is happy, you are making what is known as a reverse inference: you are assuming that the smile reveals something about the person’s emotional state that you cannot access directly (see Figure 3). Reverse inference requires calculating a conditional probability: the probability that a person is in a particular emotion episode (such as happiness) given the observation of a unique set of facial muscle movements (such as a smile). The conditional probability is written as:

p[emotion categorya unique facial configuration])

for example,

p[happinessa smiling facial configuration])

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (5)

Evaluation criteria: Reliability and specificity in relation to forward and reverse inference.

Anger and fear are used as the example categories.

Reverse inferences about emotion are ubiquitous in everyday life – whenever you experience someone as emotional, your brain has performed a reverse inference, guessing at the cause of a facial movement when only having access to the movement itself. Every time an app on a phone or computer measures someone’s facial muscle movements, identifies a facial configuration such as a frowning facial configuration, and proclaims that the target person is sad, that app has engaged in reverse inference, such as:

p[sadnessa frowning facial configuration])

Whenever a security agent infers anger from a scowl, the agent has assumed a strong likelihood for

p[angera scowling facial configuration])

Four criteria must be met to justify a reverse inference that a particular facial configuration expresses and therefore reveals a specific emotional state: reliability, specificity, generalizability and validity (explained in Table 2 and Figure 3). These criteria are commonly encountered in the field of psychological measurement and over the last several decades there has been an ongoing dialogue about thresholds for these criteria as they apply in production and perception studies, with some consensus emerging for the first three criteria (see ). Only when a pattern of facial muscle movements strongly satisfies these four criteria can we justify calling it an “emotional expression.” If any of these criteria are not met, then we should instead refer to a facial configuration with more neutral, descriptive terms without making unwarranted inferences, simply calling it a smile (rather than an expression of happiness), a frown (rather than an expression of sadness), a scowl (rather than an expression of anger), and so on.7

Table 2:

Criteria used to evaluate the empirical evidence

Expression ProductionEmotion Perception
ReliabilityWhen a person is sad, the proposed expression (a frowning facial configuration) should be observed more frequently than would be expected by chance. Likewise, for every other emotion category that is subject to a commonsense belief. Reliability is related to a forward inference: given that someone is happy, what is the likelihood of observing a smile, p[set of facial muscle movementsemotion category].When a person makes a scowling facial configuration, perceivers should consistently infer that the person is angry. Likewise, for every facial configuration that has been proposed as the expression of a specific emotion category. That is, perceivers must consistently make a reverse inference: given that someone is scowling, what is the likelihood that he is angry, p[emotion categoryset of facial muscle movements].
Chance means that facial configurations occur randomly with no predictable relationship to a given emotional state. This would mean that the facial configuration in question carries no information about the presence or absence of an emotion category. For example, in an experiment that observes the facial configurations associated with instances of happiness and anger, chance levels of scowling or smiling would be 50%.Chance means that emotional states occur randomly with no predictable relationship to a given facial configuration. This would mean that the presence or absence of an emotion category cannot be inferred from the presence or absence of the facial configuration. For example, in an experiment that observes how people perceive 51 different facial configurations, chance levels for correctly labeling a scowling face as anger would be 2%.
Reliability also depends on the base rate: how frequently people make a particularly facial configuration overall. For example, if a person frequently makes a scowling facial configuration during an experiment examining the expressions of anger, sadness and fear, he will seem to be consistently scowling in anger when in fact he is scowling indiscriminately.Reliability also depends on the base rate: how frequently people use a particular emotion label or make a particular emotional inference. For example, if a person frequently labels facial configurations as “angry” during an experiment examining scowling, smiling and frowning faces, she will seem to be consistently perceiving anger when in fact she is labeling indiscriminately.
Reliability rates between 70% and 90% provide strong evidence for the commonsense view, between 40% and 69% provide moderate support for the commonsense view, and between 20% and 39% provide weak support (Ekman, 1994; ; Russell, 1994).Reliability rates between 70% and 90% provide strong evidence for the commonsense view, between 40% and 69% provide moderate support for the commonsense view, and between 20% and 39% provide weak support (Ekman, 1994; ; Russell, 1994).
SpecificityIf a facial configuration is diagnostic of a specific emotion category, then the facial configuration should express instances of one and only one emotion category better than chance; it should not consistently express instances of any other mental event (emotion or otherwise) at better than chance levels. For example, to be considered the expression of anger, a scowling facial configuration must not express sadness, confusion, indigestion, an attempt to socially influence, etc. at better than chance levels.If a frowning facial configuration is perceived as the diagnostic expression of sadness, then a frowning facial configuration should only be labeled as sadness (or sadness should only be inferred from a frowning facial configuration) at above chance levels. And it should not be consistently perceived as expressions of any mental states other than sadness at better than chance levels.
Estimates of specificity, like reliability, depend on base-rates and on how chance levels are defined.Estimates of specificity, like reliability, depend on base-rates and on how chance levels are defined.
GeneralizabilityPatterns of reliability and specificity should replicate across studies, particularly when different populations are sampled, such as infants, congenitally blind individuals and individuals sampled from diverse cultural contexts, including small-scale, remote cultures. High generalizability across different circ*mstances ensures that scientific findings are generalizable.Patterns of reliability and specificity should replicate across studies, particularly when different populations are sampled, such as infants, congenitally blind individuals and individuals sampled from diverse cultural contexts, including small-scale, remote cultures. High generalizability across different circ*mstances ensures that scientific findings are generalizable.
ValidityEven if a facial configuration is consistently and uniquely observed in relation to a specific emotion category across many studies (strong generalizability), it is necessary to demonstrate that the person in question is really in the expected emotional state. This is the only way that a given facial configuration leads to accurate inferences about a person’s emotional state. A facial configuration is valid as a display or a signal for emotion if and only if it is strongly associated with other measures of emotion, preferably those that are objective and do not rely on anyone’s subjective report (i.e., a facial configuration should be strongly and consistently related to perceiver-independent evidence about the emotional state of the expresser).Even if a facial configuration is consistently and uniquely labeled with a specific emotion word across many studies (strong generalizability), it is necessary to demonstrate that the person making the facial configuration is really in the expected emotional state. This is the only way that a given perception or inference of emotion is accurate. A perceiver can only be said to be recognizing an emotional expression if and only if the person being perceived is verifiably in the expected emotional state.

Note: Reliability is also related to sensitivity, consistency, informational value, and the true positive rate (for further description, see Figure 3). Specificity is related to uniqueness, discreteness, the true negative rate and referential specificity. In principle, we can also ask more parametrically whether there is a link between the intensity of an emotional instance and the intensity of facial muscle contractions, but scientists rarely do.

The Null Hypothesis and the Role of Context

Tests of reliability, specificity, generalizability and validity are almost always compared to what would be expected by sheer chance, if facial configurations (in studies of expression production) and inferences about facial configurations (in studies of emotion perception) occurred randomly with no relation to particular emotional states. In most studies, chance levels constitute the null hypothesis. An example of the null hypothesis for reliability is that people do not scowl when angry more frequently than would be expected by chance.8 If people are observed to scowl more frequently when angry than they would by chance, then the null hypothesis can be rejected based on the reliability of the findings. We can also test the null hypothesis for specificity: If people scowl more frequently than they would by chance not only when angry but also when fearful, sad, confused, hungry, etc., then the null hypothesis for specificity is retained.9

In addition to testing hypotheses about reliability and specificity, tests of generalizability are becoming more common in the research literature, again using the null hypothesis. Questions about generalizability test whether a finding in one experiment is reproduced in other experiments in different contexts, using different experimental methods or sampling people from different populations. There are two crucial questions about generalizability when it comes to the production and perception of emotional expressions: Do the findings from a laboratory experiment generalize to observations in the real world? And, do the findings from studies that sample participants from Westernized, Educated, Industrialized, Rich and Democratic (WEIRD; ) populations generalize to people who live in small-scale, remote communities?

Questions of validity are almost never addressed in production and perception studies. Even if reliable and specific facial movements are observed across generalizable circ*mstances, it is a difficult and unresolved question as to whether these facial movements can justify an inference about a person’s emotion state. We have more to say about this later. In this paper, we evaluate the common view by reviewing evidence pertaining to the reliability, specificity, and generalizability of research findings from production and perception studies.

A focus on rejecting the null hypothesis, defined by what would be expected by chance alone, provides necessary but not sufficient support for the common view of emotional expressions. A slightly above chance co-occurrence of a facial configuration and instances of an emotion category, such as scowling in anger – for example, a correlation coefficient around r = .20 to .39 (adapted from ) -- suggests that a person sometimes scowls in anger, but not most or even much of the time. Weak evidence for reliability suggests that other factors not measured in the experiment are likely causing people to scowl during an instance of anger. It also suggests that people may express anger with facial configurations other than a scowl, possibly in reliable and predictable ways. Following common usage, we refer to these unmeasured factors collectively as context. A similar situation can be described for studies of emotion perception: when participants label a scowling facial configuration as “anger” in a weakly reliable way (between .20 and .39 percent of the time; ), then this suggests the possibility of unmeasured context effects.

In principle, context effects make it possible to test the common view by comparing it directly to an alternative hypothesis that a person’s brain will be influenced by other causal factors (as opposed to comparing the findings to random chance). It is possible, for example, that a state of anger is expressed differently depending on various factors that can be studied, including the situational context (such as whether a person is at work, at school, or at home), social factors (such as who else is present in the situation and the relationship between the expresser and the perceiver), the person’s internal physical context (based on how much sleep they had, how hungry they are, etc.), a person’s internal mental context (such as the past experiences that come to mind or the evaluations they make), the temporal context (what just occurred a moment ago), differences between people (such as whether someone is male or female, warm or distant), and the cultural context, such as whether the expression is occurring in a culture that values the rights of individuals (vs. group cohesion), is open and allows for a variety of behaviors in a situation (vs. closed, having more rigid rules of conduct). Other theoretical approaches offer some of these specific alternative hypotheses (see Box 2 in SOM). In practice, however, experiments almost always test the common view against the null hypothesis for reliability and specificity and rarely test specific alternative hypotheses. When context is acknowledged and studied, it is usually examined as a factor that might moderate a common and universal emotional expression, preserving the core assumptions of the common view (e.g., Cordaro et al., 2017; for more discussion, see Box 3, SOM).

A Focus on Six Emotion Categories: Anger, Disgust, Fear, Happiness, Sadness and Surprise

Our critical examination of the research literature in this paper focuses primarily on testing the common view of facial expressions for six emotion categories -- anger, disgust, fear, happiness, sadness and surprise. We do not include a discussion of every emotion category ever studied in the science of emotion. We do not discuss the many emotion categories that exist in non-English speaking cultures, such as gigil, the irresistible urge to pinch or squeeze something cute, or liget, exuberant, collective aggression (for discussion of non-English emotion categories, see ; Pavlenko, 2014; Russell, 1991). We do not discuss the various emotion categories that have been documented throughout history (e.g., Smith, 2016). Nor do we discuss every English emotion category for which a prototypical facial expression has been suggested. For example, recent studies motivated primarily by the basic emotion approach have suggested that there are “more than six distinct facial expressions …in fact, upwards of 20 multimodal expressions” (Keltner et al., in press, pg. 4), meaning that scientists have proposed a prototypic facial configuration as the facial expression for each of twenty or so emotion categories, including confusion, embarrassment, pride, sympathy, awe, and so on.

The reasons for our focus on six emotion categories are twofold. First, anger, disgust, fear, happiness, sadness and surprise categories anchor common beliefs about emotions and their expressions (as is evident from Box 4, in SOM) and therefore represent the clearest, strongest test of the common view. Second, these six emotion categories have been the primary focus of systematic research for almost a century and therefore provide the largest corpus of scientific evidence that can be evaluated. Unfortunately, the same cannot be said for any of other emotion categories in question. This is a particularly important point when considering the twenty plus emotion categories that are now the focus of research attention. A PsycInfo search for the term “facial expression” combined with “anger, disgust, fear, happiness, sadness, surprise” produced over 700 entries, but a similar search including “love, shame, contempt, hate, interest, distress, guilt” returned less than 70 entries (). Almost all cross-cultural studies of emotion perception have focused on just anger, disgust, fear, happiness, sadness and surprise (plus or minus a few) and experiments that measure how people spontaneously move their faces to express instances of emotion categories other than these six remain rare. In particular, there are too few studies that measure spontaneous facial movements during episodes of other emotion categories (i.e., production studies) to conclude anything about reliability and specificity, and there are too few studies of how these additional emotion categories are perceived in small-scale, remote cultures to conclude anything about generalizability. In an era where the generalizability and robustness of psychological findings are under close scrutiny, it seemed prudent to focus on the emotion categories for which there are, by a factor of ten, the largest number of published experiments. Our discussion, which is based on a sample of six emotion categories, generalizes to emotion categories that have been studied, however.10

The proposed expressive facial configurations for each emotion category are presented in Figure 4, and the origin of these facial configurations is discussed in Box 4 in SOM. They originated with Charles Darwin, who stipulated (rather than discovered) that certain facial configurations are expressions of certain emotion categories, inspired by photographs taken by duch*enne and drawings made by the Scottish anatomist Charles Bell (Darwin, 1872). These stipulations largely form the basis of the common view of emotional expressions.

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (6)

Facial action ensembles for commonsense facial configurations.

Facial action coding system (FACS) codes that correspond to the commonsense expressive configuration in adults. A is proposed expression for anger and corresponds to prescribed EMFACS code for anger (AUs 4, 5, 7, and 23). B is proposed expression for disgust and corresponds to prescribed EMFACS code for disgust (AU 10). C is proposed expression for fear and corresponds to prescribed EMFACS code for fear (AUs 1, 2, and 5 or 5 and 20). D is proposed expression for happiness and corresponds to prescribed EMFACS code for the so-called duch*enne smile (AUs 6 and 12). E is proposed expression for sadness and corresponds to prescribed EMFACS code for sadness (AUs 1, 4, 11 and 15 or 1, 4, 15 and 17). F is proposed expression for surprise and corresponds to prescribed EMFACS code for surprise (AUs 1, 2, 5, and 26). It was originally proposed that infants express emotions with the same facial configurations as adults. Later research revealed morphological differences between the proposed expressive configurations for adults and infants. Only three out of a possible nineteen proposed configurations for negative emotions from the infant coding scheme were the same as the configurations proposed for adults (Oster et al., 1992). G. adapted from Cordaro et al. (in press), Table 1, with permission. Face photos © Dr. Lenny Kristal. H. adapted from , Figure 2, with permission.

Producing Facial Expressions of Emotion: A Review of the Scientific Evidence

In this section, we first review the design of a typical experiment where emotions are induced and facial movements are measured. This review highlights several observations to keep in mind as we review the reliability, specificity and generalizability for expressions of anger, disgust, fear, happiness, sadness and surprise in a variety of populations, including adults in both urban and small-scale remote cultures, infants and children, and congenitally blind individuals. Our review is the most comprehensive to date and allows us to comment on whether the scientific findings generalize across different populations of individuals. The value of doing so becomes apparent when we observe how similar conclusions emerge from these research domains.

The Anatomy of a Typical Experiment Designed to Observe People’s Facial Movements During Episodes of Emotion

In the typical expression production experiment, scientists expose participants to objects, images or events that they (the scientists) believe will evoke an instance of emotion. It’s possible, in principle, to evoke a wide variety of instances for a given emotion category (e.g., Wilson-Mendenhall et al., 2015), but in practice, published studies evoke the most typical instances of each category, often elicited with a stimulus that is presented without context (e.g., a photograph, a short movie clip separated from the rest of the film, etc.). Scientists usually include some measure to verify that participants are in the expected emotional state (such as asking participants to describe how they feel by rating their experience against a set of emotion adjectives). They then observe participants’ facial movements during the emotional episode and then quantify how well the measure of emotion predicts the observed facial movements. When done properly, this yields estimates of reliability and specificity, and in principle provides data to assess generalizability. There are limitations to assessing the validity of a facial configuration as an expression of emotion, as we explain below.

Measuring facial movements.

Healthy humans have a common set of 17 facial muscle groups on each side of the face that contract and relax in patterns.11 To create facial movements that are visible to the naked eye, facial muscles contract, changing the distance between facial features () and shaping skin into folds and wrinkles on an underlying skeletal structure. Even when facial movements look the same to the naked eye, there may be differences in their execution under the skin. There are individual differences in mechanics of making a facial movement, including variation in the anatomical details (e.g., everyone has a slightly different configuration and relative size of the muscles, some people lack certain muscle components, etc.), in the neural control of those muscles (; ; Muri, 2015), and in the underlying skeletal structure of the face (discussed in Box 5, in SOM).

There are three common procedures for measuring facial movements in a scientific experiment. The most sensitive, objective measure of facial movements detects the electrical activity from actual muscular contractions, called facial electromyography (again, see Box 5, in SOM). This is a perceiver-independent way of assessing facial movements that detects muscle contractions that are not necessarily visible to the naked eye (). Facial EMG’s utility is unfortunately offset by its impracticality: facial EMG requires placing electrodes on a participant’s face, which can cause skin abrasions. In addition, a person can typically tolerate only a few electrodes on the face at a time. At the writing of this paper, there were relatively few published papers using facial EMG (we identified 123 studies), the overwhelming majority of which sparsely sampled the face, measuring the electrical signals for only a small number of muscles (between one to six); none of the studies measured naturalistic facial movements as they occur outside the lab, in everyday life. As a consequence, we focus our discussion on two other measurement methods: a perceiver-dependent method that describes visible facial movements, called facial actions, which uses human coders who indicate the presence or absence of a facial movement while viewing video recordings of participants, and automated methods for detecting of facial actions from photographs or videos.

Measuring facial movements with human coders.

The Facial Action Coding System, or FACS (Ekman et al., 2002), is a systematic approach to describe what a face looks like when facial movements have occurred. FACS codes describe the presence and intensity of facial movements. Importantly, FACS is purely descriptive and is therefore agnostic about whether those movements might express emotions or any other mental event.12 Human coders train for many weeks to reliably identify specific movements called “action units” or AUs. Each AU is hypothesized to correspond to the contraction of a distinct facial muscle or a distinct grouping of muscles that is visible as a specific facial movement. For example, the raising of the inner corners of the eyebrows (contracting the frontalis muscle pars medialis) corresponds to AU 1. Lowering of the inner corners of the brows (activation of the corrugator supercilii, depressor glabellae and depressor supercilii) corresponds to AU 4. AUs are scored and analyzed as independent elements, but the underlying anatomy of many facial muscles constrains them so they cannot move independently of one another, generating dependencies between AUs (e.g., see ). Facial action units (AU) and their corresponding list of facial muscles can be found in Table 3. Expert FACS coders approach inter-rater reliabilities of .80 for individual AUs (). The first version of FACS () was largely based on the work of Swedish anatomist Carl-Herman Hjortsjö who catalogued the facial configurations described by duch*enne (Hjortsjö, 1969). In addition to the updated versions of FACS (Ekman et al., 2002), other facial coding systems have been devised for human infants (Izard et al., 1995; Oster, 2003), chimpanzees (Vick et al., 2007), and macaque monkeys (Parr et al., 2010).13Figure 4 displays the common FACS codes for the configurations of facial movements that have been proposed as the expression of anger, disgust, fear, happiness, sadness and surprise.

Table 3:

The Facial Action Coding System (FACS; ) codes for adults

AUDescriptionFacial muscles (type of activation)
1Inner brow raiserFrontalis (pars medialis)

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (7)

2Outer brow raiserFrontalis (pars lateralis)

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (8)

4Brow lowererCorrugator supercilii, depressor supercilii

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (9)

5Upper lid raiserLevator palpebrae superioris

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (10)

6Cheek raiserOrbicularis oculi (pars orbitalis)

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (11)

7Lid tightenerOrbicularis oculi (pars palpebralis)

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (12)

9Nose wrinkleLevator labii superioris alaquae nasi

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (13)

10Upper lip raiserLevator labii superioris

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (14)

11Nasolabial deepenerZygomaticus minor

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (15)

12Lip corner pullerZygomaticus major

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (16)

13Cheeks pufferLevator anguli oris

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (17)


Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (18)

15Lip corner depressorDepressor anguli oris

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (19)

16Lower lip depressorDepressor labii inferioris

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (20)

17Chin raiserMentalis

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (21)

18Lip puckererIncisivii labii superioris and incisivii labii inferioris

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (22)

20Lip stretcherRisorius w/ platysma

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (23)

22Lip funnelerOrbicularis oris

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (24)

23Lip tightenerOrbicularis oris

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (25)

24Lip pressorOrbicularis oris

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (26)

25Lips partDepressor labii inferioris or relaxation of mentalis, or orbicularis oris

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (27)

26Jaw dropMasseter, relaxed temporalis and internal terygoid

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (28)

27Mouth stretchPterygoids, digastric

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (29)

28Lip suckOrbicularis oris

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (30)

41Lid Droop

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (31)


Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (32)

43Eyes Closed

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (33)


Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (34)


Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (35)


Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (36)

Measuring facial movements with automated algorithms.

Human coders require time-consuming, intensive training and practice before they can reliably assign AU codes. After training, it is a slow process to code photographs or videos frame by frame making human FACS coding impractical to use on facial movements as they occur in everyday life. Large inventories of naturalistic photographs and videos, which have been curated only fairly recently (Benitez-Quiroz et al., 2016), would require decades to manually code. This problem is addressed by automated FACS coding systems using computer vision algorithms (; Martinez, 2017; Valstar et al., 2017).14 Recently developed computer vision systems have automated the coding of some (but not all) facial AUs (e.g., Benitez-Quiroz et al., in press; Benitez-Quiroz et al., 2017b; Chu et al., 2017; Corneanu et al., 2016; ; Martinez, 2017a; ; Valstar et al., 2017; see Box 6, SOM) making it more feasible to observe facial movements as they occur in everyday life, at least in principle (see Box 7, SOM). Automated FACS coding is accurate (>90%) when compared to the AU codes from expert human coders, provided that the images were captured under ideal laboratory conditions, where faces are viewed from the front, are well illuminated, are not occluded, and are posed in a controlled way (Benitez-Quiroz et al., 2016). Under ideal conditions, accuracy is highest (~99%) when algorithms are tested and trained on images from the same database (Benitez-Quiroz et al., 2016). The best of these algorithms works quite well when trained and tested on images from different databases (~90%), as long as the images are all taken in ideal conditions (Benitez-Quiroz et al., 2016). Accuracy (compared to human FACS coding) decreases substantially more when coding facial actions in still images or in video frames taken in everyday life where conditions are unconstrained and facial configurations are not stereotypical (e.g.,Yitzhak et al., 2017).15 For example, 38 automated FACS coding algorithms were recently trained on one million images (the 2017 EmotioNet Challenge; Benitez-Quiroz et al., 2017a) and evaluated against separate test images which were FACS coded by experts.16 In these less constrained conditions, accuracy dropped below 83% and a combined measure of precision and recall (a measure called F1, ranging from zero to one) was below .65 (Benitez-Quiroz et al., 2017a).17 These results indicate that current algorithms are not accurate enough in their detection of facial AUs to fully substitute for expert coders when describing facial movements in everyday life. Nonetheless, these algorithms offer a distinct practical advantage because they can be used in conjunction with human coders to speed up the study of facial configurations in millions of images in the wild. It is likely that automated methods will continue to improve as better and more robust algorithms are developed and as more diverse face images become available.

Measuring an emotional state.

Once an approach has been chosen for measuring facial movements, a clear test of the common view of emotional expressions depends on having valid measures that reliably and specifically characterize the instances of each emotion category in a generalizable way, to which the measurements of facial muscle movements can be compared. The methods that scientists use to assess people’s emotional states vary in their dependence on human inference, however, which raises questions about the validity of the measures.

Relatively objective measures of an emotional instance.

The more objective end of the measurement spectrum includes dynamic changes in the autonomic nervous system (ANS), such as cardiovascular, respiratory or perspiration changes (measured as variations in skin conductance), and dynamic changes in the central nervous system, such as changes in blood flow or electrical activity in the brain. These measures are thought to be more objective because the measurements themselves (the numbers) do not require a human judgment (i.e., the measurements are perceiver-independent). Only the interpretation of the measurements (their psychological meaning) requires human inference. For example, a human observer does not judge whether skin conductance or neural activity increases or decreases; human judgment only comes into play when the measurements are interpreted for the emotional meaning.

Currently, there are no objective measures, either singly or as a pattern, that reliability and uniquely identify one emotion category from another in a replicable way. Statistical summaries of hundreds of experiments, called meta-analyses, show for example, that currently there is no relationship between an emotion category, such as anger, and a single, specific set of physical changes in ANS that accompany the instances of that category, even probabilistically (the most comprehensive study published to date is Siegel et al., 2018, but for earlier studies see Cacioppo et al., 2000; Stemmler, 2004; also see Box 8, SOM). In anger, for example, blood pressure can go up, go down, or stay the same (i.e., changes in blood pressure are not consistently associated with anger). And a rise in blood pressure is not unique to instances of anger; it also can occur during a range of other emotional episodes (i.e., changes in blood pressure do not specifically occur in anger and only in anger). 18Individual studies often find patterns of ANS measures that distinguish an instance of one emotion category from another, but those patterns don’t replicate and instead vary across studies, even when studies use the same methods and stimuli, and sample from the same population of participants (e.g., compare findings from with Stephens, Christie, & Friedman, 2010). Similar within-category variation is routinely observed for changes in neural activity measured with brain imaging (Lindquist et al., 2012) and single neuron recordings (). For example, pattern classification studies discover multivariate patterns of activity across the brain for emotion categories such as anger, sadness, fear, and so on, but these patterns do not replicate from study to study (e.g., ; Saarimäki et al., 2016; Wager et al., 2015; for a discussion, see Clark-Polner et al., 2017). This observed variation does not imply that biological variability during emotional episodes is random, but rather that it may be context-dependent (e.g., yellow and green zones of Figure 1). It may also be the case that current biological measures are simply insufficiently sensitive or comprehensive enough to capture situated variation in a precise way. If this is so, then such variation should be considered unexplained, rather than random.

There is a difficult circularity built into these studies that is worth pointing out, and that we encounter again a few paragraphs down: Scientists must use some criterion for identifying when instances of an emotion category are present in the first place (so as to draw conclusions about whether or not emotion categories can be distinguished by different patterns of physical measurements).19 In most studies that attempt to find bodily or neural “signatures” of emotions, the criterion is a subjective one, either reported by the participants or provided by the scientist, which introduces problems of its own, as we discuss in the next section.

Subjective measures of an emotional instance.

Without objective measures to identify the emotional state of a participant, scientists typically rely on the relatively more subjective measures that anchor the other end of the measurement spectrum. The subjective judgments can come from the participants (who complete self-report measures), from other observers (who infer emotion in the participants), or from the scientists themselves (who use a variety of criteria, including commonsense, to infer the presence of an emotional episode). These are all examples of perceiver-dependent measurements because the measurements themselves, as well as their interpretation, directly rely on human inference.

Scientists often rely on their own judgments and intuitions to stipulate when an emotion is present or absent in participants (as Charles Darwin did). For example, snakes and spiders are said to evoke fear. So are situations that involve escaping from a predator. Sometimes scientists stipulate that certain actions indicate the presence of fear, such as freezing or fleeing or even attacking in defense. The conclusions that scientists draw about emotions depends on the validity of their initial assumptions. It is noteworthy that when it comes to emotions, scientists use exactly the same categories as non-scientists, which may give us cause for concern, as forewarned by William James (James, 1890, 1894)20

Inferences about emotional episodes can also come from other people, for example independent samples of study participants, who categorize the situations in which facial movements are observed. Scientists can ask observers to infer when participants are emotional by having them judge subjects’ behavior or tone of voice; for example, see our discussion of Camras et al. (2007) discussed in the section on infants and children, below.

A third common strategy to identify the emotional state of participants is to simply ask them what they are experiencing. Their self-reports of emotional experience then become the criteria for deciding whether an emotional episode is present or absent. Self-reports are often considered imperfect measures of emotion because they depend on subjective judgements and beliefs and require translation into words. In addition, a person can be experiencing an emotional event yet be unaware of it and therefore unable to report on it (i.e., a person can be conscious but unaware of their experience and unable to report it), or may be unable to express how they feel using emotion words, a condition known as alexithymia. Despite questions about their validity, self-reports are the most common measure of emotion that scientists compare to facial AUs.

Human inference and assessing the presence of an emotional state.

At this point, it should be obvious that any measure of an emotional state, to which measurements of facial muscle movements can be compared, itself requires some degree of human inference; what varies is the amount of inference that is required. Herein lies a problem: To properly test the hypothesis that certain facial movements reliably and specifically express emotion, scientists (ironically) must first make a reverse inference that an emotional event is occurring – that is, they infer the emotional instance by observing changes in the body, brain, and behavior (e.g., only if blood pressure consistently and uniquely rises in anger can a rise in blood pressure be used as a marker of anger). Or they infer (a reverse inference) that an event or object evokes an instance of a specific emotion category (e.g., an electric shock elicits fear but not irritation, curiosity, or uncertainty). These reverse inferences are scientifically sound only if measures of emotion reliably, specifically and validly characterize the instances of the emotion category. So, any clear, scientific test of the common view of emotional expressions rests on a set of more basic inferences about whether an emotional episode is present or absent, and any conclusions that come from such a test are only as sound as those basic inferences.

If all measures of emotion (to which measurements of facial muscle movements are compared) rest on human judgment to some degree, then, in principle, this prevents a scientist from being sure that an emotional state is present, which in turn limits the validity of any experiment designed to test whether a facial configuration validly expresses a specific emotion category. All face-emotion associations that are observed in an experiment reflect human consensus, i.e., the degree of agreement between self-judgments (of the participants), expert-judgments (of the scientist), and/or judgments of other observers (of perceivers who are asked to infer emotion in the participants). These types of agreement are often incorrectly referred to as accuracy. We touch on this point again when we discuss studies that test whether certain facial configurations are routinely perceived as expressions of anger, disgust, fear, and so on.

Testing the common view of emotional expressions: Interpreting the scientific observations.

If a specific facial configuration reliably expresses instances of a certain emotion category in any given experiment, then we would expect measurements of the face (e.g., facial AU codes) to co-occur with measurements that indicate that participants are in the target emotional state. In principle, those measures might be more objective, such as ANS changes during an emotional event, or they might be more subjective, deriving from the scientist, from other perceivers who make judgments about the study participants, or from the participants themselves. In practice, however, most experiments compare facial movements to subjective measures of emotion -- a scientist’s judgment about which emotions are evoked by a particular stimulus, perceivers judgments about participants’ emotional states, or participants’ self-reports of emotional experience -- because ANS and other more objective measurements do not themselves distinguish one emotion category from another in a reliable and specific way. For example, in an experiment, scientists might ask: Do the AUs that create a scowling facial configuration co-occur with self-reports of feeling angry? Do the AUs that create a pouting facial configuration co-occur with perceiver’s judgments that participants are sad? Do the AUs that create a wide-eyed gasping facial configuration co-occur when people are exposed to an electric shock? And so on. If such observations suggest that a configuration of muscle movements is reliably observed during episodes of a given emotion category, then those movements are said to express the emotion in question. As we will see, many studies show that some facial configurations occur more often than random chance, but are not observed with a high degree of reliability (according to the criteria from , outlined in Table 2 and Figure 3).

If a specific facial configuration specifically (i.e., uniquely) expresses instances of a certain emotion category in any given experiment, then we would expect to observe little co-occurrence between measurements of the face and measurements indicating the presence of emotional instances from other categories, except what would be expected by chance (again, see Table 2 and Figure 3). For example, in an experiment, scientists might ask: do the AUs that create a scowling facial configuration co-occur with self-reports of feeling sad, confused, or social motives such as dominance? Do the AUs that create a pouting facial configuration co-occur with perceiver’s judgments that participants are angry or afraid? Do the AUs that create a wide-eyed gasping facial configuration co-occur when people are exposed to a competitor whom they are trying to scare? And so on.

If a configuration of facial movements is observed in instances of a certain emotion category in a reliable, specific way within an experiment, so that we can infer that the movements are expressing an instance of the emotion in that study as hypothesized, then scientists can safely infer that the facial movements in question are an expression of that emotion category’s instances in that situation. One more step is required before we can infer that the facial configuration is the expression of that emotion: we must observe a similar pattern of facial configuration-emotion co-occurrences across different experiments, to some extent generalizing across the specific measures and methods used and the participants and contexts sampled. If the facial configuration-emotion co-occurrences replicate across experiments that sample people from the same culture, then the facial configuration in question can be reasonably be referred to as an emotional expression only in that culture; e.g., if a scowling facial configuration co-occurs with measures of anger (and only anger) across most studies conducted on adult participants in the US who are free from illness, then it is reasonable to refer to a scowl as an expression of anger in the US. If facial configuration-emotion co-occurrences generalize across cultures – that is, replicate across experiments that sample a variety of instances of that emotion category in people from different cultures -- then the facial configuration in question can be said to universally express the emotion category in question.

Studies of Healthy Adults from the U.S. and Other Developed Nations

We now review the scientific evidence from studies that document how people spontaneously move their facial muscles during instances of anger, disgust, fear, happiness, sadness and surprise, and how they pose their faces when asked to indicate how they express each emotion category. We examine evidence gathered in the lab and in naturalistic settings, sampling healthy adults who live in a variety of cultural contexts. To evaluate the reliability, specificity and generalizability of the scientific findings, we adapted criteria set out by , as discussed in Table 2.

Spontaneous facial movements in laboratory studies.

A meta-analysis was recently conducted to test the hypothesis that the facial configuration in Figure 4 co-occur, as hypothesized, with specific emotion categories (Duran et al., 2017). This analysis was published in a book chapter. Thirty-seven published articles reported on how people moved their faces when exposed to objects or events that evoke emotion. Most studies included in the meta-analysis were conducted in the laboratory. The findings from these experiments were statistically summarized to assess the reliability of facial movements as expressions of emotion (see Figure 5). In all emotion categories tested, other than fear, participants moved their facial muscles into the expected configuration more consistently than what we would expect by chance. Consistency levels were weak, however, indicating that the proposed facial configurations in Figure 4 have limited reliability (and to some extent, limited generalizability; i.e., a scowling facial configuration is an expression of anger, but not the expression of anger. More often than not, people moved their faces in ways that were not consistent with the hypotheses of the common view. An expanded version of this meta-analysis () analyzed 89 effect sizes from 47 studies totaling 3599 participants, with similar results: the hypothesized facial configurations were observed, with average effect sizes of r = .32 (for the average correlation between the intensity of a facial configuration and a measure of emotion, with correlations for specific emotion categories ranging from .25 to .38, corresponding to weak evidence of reliability) and proportion = .19 (for the average proportion of the times that a facial configuration was observed during an emotional event, with proportions for specific emotion categories ranging from .15 to .25, interpreted as no evidence to weak evidence of reliability).21

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (37)

Meta-analysis of facial movements during emotional episodes: A summary of effect sizes across studies (Duran et al., 2017).

Effect sizes are computed as correlations or proportions (as reported in the original experiments). Results include experiments that reported a correspondence between a facial configuration and its hypothesized emotion category and those that reported a correspondence between individual AUs of that facial configuration and the relevant emotion category; meta-analytic summaries for entire ensembles of AUs only (the facial configurations specified in Figure 2) were even lower than those that appear here.

No overall assessment of specificity was reported in either the original or the expanded meta-analysis because most published studies do not report the false positive rate (i.e., the frequency with which a facial AU is observed when an instance of the hypothesized emotion category was not present; see Figure 3). Nonetheless, some striking examples of specificity failures have been documented in the scientific literature. For example, a certain smile, called a “duch*enne” smile, is defined in terms of facial muscle contractions (i.e., in terms of facial morphology): it involves movement of the orbiculari oculis which raises the cheeks and causes wrinkles at the outer corners of the eyes in addition to movement of the zygomatic major which raises the corners of the lips into a smile. A duch*enne smile is thought to be a spontaneous expression of authentic happiness. Research shows that a duch*enne smile can be intentionally produced when people are not happy, however (; Gunnery et al., 2013; also see ), consistent with evidence that duch*enne smiles often occur when people are signaling submission or affiliation rather than solely reflecting happiness (Rychlowska et al., 2017).

Spontaneous facial movements in naturalistic settings.

Studies of facial configuration-emotion category associations in naturalistic settings tend to yield similar results to studies that were conducted in more controlled laboratory settings (Fernandez-Dols, 2017; ). Some studies observe that people express emotions in real world settings by spontaneously making the facial muscle movements proposed in Figure 4, but such observations do not replicate well across studies (e.g., compare vs. Crivelli, Carrera and Fernandez-Dols, 2015; vs. ). For example, two field studies of winning judo fighters recently demonstrated that so-called “duch*enne” smiles were better predicted by whether an athlete was interacting with an audience than the degree of happiness reported after winning their matches (). Only eight of the 55 winning fighters produced a “duch*enne” smile in Study 1; all occurred during a social interaction. Only 25 out of 119 winning fighters produced a “duch*enne” smile in Study 2, documenting, at best, weak evidence for reliability.

Posed facial movements.

Another source of evidence comes from asking participants sampled from various cultures to deliberately pose the facial configurations that they believe they use to express emotions. In these studies, participants are given a single emotion word or a single, brief statement to describe each emotion category and then asked to freely pose the expression that they believe they make. In this way, they directly examine common beliefs about emotional expressions. For example, one study provided college students from Canada and Gabon (in Central Africa) with dictionary definitions for ten emotion categories. After practicing in front of a mirror, participants posed the facial configurations so that “their friends would be able to understand easily what they feel” and their poses were FACS coded (Elfenbein et al., 2007, p. 134). Similarly, a recent study asked college students in China, India, Japan, Korea, and the US, to pose the facial movements they believe they make when expressing each of 22 emotion categories (). Participants heard a brief scenario describing an event that might cause anger (“You have been insulted, and you are very angry about it”) and then were instructed to pose a facial (and non-verbal, vocal) expression of emotion, as if the events in the scenario were happening to them. Experimenters were present in the testing room as participants posed their responses. Both studies found moderate to strong evidence for a cross-cultural, common expressive pose for anger, fear, and surprise categories, and weak to moderate evidence for the happiness category, with cultural variation around those common poses; the findings were weaker for disgust and sadness categories (Figure 6).

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (38)

Comparing posed and spontaneous facial movements.

Results from Table 6, Cordaro et al. (2017), degree of overlap between the hypothesized configuration of facial movements for each emotion category and the “International Core Patterns” derived from participants’ expressive poses; Gabonese participants in Elfenbein et al. (2007), reliability for the anger category is for AU4 + AU5 only; proportion data only from Duran et al., (2017).

Neither study compared participants’ posed expressions to observations of how they actually moved their faces when expressing emotion. Nonetheless, a quick comparison of the findings from both studies and the proportions of spontaneous facial movements made during emotional events (from the Duran et al. (2017) meta-analysis) makes it clear that posed and spontaneous movements differ, sometimes quite substantially (again, see Figure 6). When people pose a facial configuration that they believe expresses an emotion category, they make facial movements that more reliably agree with the hypothesized facial configurations in Figure 6. The same cannot be said of people’s spontaneous facial movements during actual emotional episodes, however (for convergent evidence, see ; Namba et al., 2016). One possible interpretation of these findings is that posed and spontaneous facial muscle configurations correspond to distinct communication systems. Indeed, there is some evidence that volitional and involuntary facial movements are controlled by different circuits in the skeletomotor system (Rinn, 1984). Another factor that may contribute to the discrepancy between posed and spontaneous facial movements is that people’s beliefs about their own behavior often reflect their stereotypes or beliefs and do not necessarily correspond to how they actually behave in real life (see ).


Our review of the available evidence thus far is summarized in the first through third data rows in Table 4. The hypothesized facial configurations presented in Figure 4 spontaneously occur with weak reliability during instances of the predicted emotion category, suggesting that they sometimes serve to express the predicted emotion. Furthermore, the specificity of each facial configuration as an expression of a specific emotion category is largely unknown (because it is typically not reported in many studies). In our view, this pattern of findings is most compatible with the interpretation that hypothesized facial configurations are not made reliably or specifically enough to use them to infer a person’s emotional state. We are not suggesting that facial movements are meaningless and devoid of information. Instead, the data suggest that the meaning of any set of facial movements may be much more variable and context-dependent than hypothesized by the common view.

Table 4:

Reliability and specificity: A summary of the evidence

Expression Production
 Adults, Developed, Spontaneous, Labweakunknown
 Adults, Developed, Spontaneous, Naturalisticweakunknown
 Adults, Developed, Posedweak to strongunknown
 Adults, Remote, Spontaneousunclearunknown
 Adults, Remote, Posedweak to strongunknown
 Newborns, Infants, Toddlersunsupportedunsupported
 Congenitally Blindunsupported to weakunsupported
Emotion Perception
 Adults, Developed, Choice-From-Arraymoderate to strongunknown
 Adults, Developed, Reverse
Correlation (with Choice-From-Array)
 Adults, Developed, Free-Labelingweak to moderateweak
 Adults, Developed, Virtual Humansunknownunknown
 Adults, Remote, Choice-From-Array (before 2008)moderate to strongunknown
 Adults, Remote, Choice-From-Array (after 2008)weak to moderateunsupported
 Adults, Remote, Free-Labeling (before 2008)unsupported to strongvariable
 Adults, Remote, Free-Labeling (after 2008)unsupportedunsupported
 Infants, Young Childrenunsupportedunsupported

Note. Criteria were adopted from , who suggest that reliability rates of 70±90% are considered strong evidence for universal emotion perception (following Ekman, 1994a); presumably, this would also hold for studies of expression production. Weak evidence is in the range of 20±40% (following Russell, 1994). By interpolation, reliability between 41% and 69% would be considered moderate evidence for reliability. Reliability estimates below 20% are interpreted as findings that clearly do not support the reliability hypothesis. We also adopted these criteria for specificity findings. Developed = studies of participants from the U.S. and other more urban countries. Spontaneous = spontaneous facial movements. Posed = posed facial configurations. Remote = studies of participants from small-scale, remote samples.

Studies of Healthy Adults Living in Small-Scale, Remote Cultures

The emotion categories that are at the heart of common view– anger, disgust, fear, happiness, sadness and surprise -- were derived from modern US English (Wierzbicka, 2014) and their proposed expressions (in Figure 4) derive from observations of people who live in urbanized, Western settings. Nonetheless, it is hypothesized that these are facial configurations evolved as emotion-specific expressions to signal socially-relevant emotional information () in the challenging situations that originated in our hunting and gathering hominin ancestors who lived on the African savannah during the Pleistocene era (Pinker, 1997; Tooby & Cosmides, 1990). It is further hypothesized that these facial configurations should therefore be observed during instances of the predicted emotion categories with strong reliability and specificity in people around the world, although the facial movements might be slightly modified by culture (Cordaro et al., 2017; Ekman, 1972). The strongest test of these hypotheses would be to sample participants who live in remote parts of the world with relatively little exposure to western cultural norms, practices and values (; Henrich et al., 2010) and observe their facial movements during emotional episodes.22 In our evaluation of the evidence, we continued to use the criteria summarized by ; see Table 2).

Spontaneous facial movements in naturalistic settings.

Our review of scientific studies that systematically measure the spontaneous facial movements in people of small-scale, remote cultures is brief by necessity: there aren’t any. At the time of publication, we were unable to identify even a single published report or manuscript registered on open-access, pre-print services that measured facial muscle movements in people of remote cultures as they experienced emotional events. Scientists have almost exclusively observed how people label facial configurations as emotional expressions (i.e., they study emotion perception, not production) to test the hypothesis that certain facial configurations evolved to express certain emotion categories in a reliable, specific and generalizable (i.e., universal) manner. Later in the paper we return to this issue and discuss the findings from these emotion perception studies.

There are nonetheless several descriptive reports that provide support for the common view of universal emotional expressions (similar to what Valente et al., 2017 refer to as an “observational approach”). For example, the US psychologist Paul Ekman and colleagues curated an archive of photographs of the Fore hunter-gatherers taken during his visits to Papua New Guinea in the 1960s (Ekman,1980). The photographs were taken as people went about their daily activities in the small hamlets of the eastern highlands of Papua New Guinea. Ekman used his knowledge of the situation in which each photograph was taken to assign each facial configuration to an emotion category, leading him to conclude that the Fore expressed emotions with the proposed facial configurations shown in Figure 4. Yet different scientific methods yielded a contrasting conclusion. When Trobriand Islanders living in Papua New Guinea were asked to infer emotions in facial configurations by labeling these photographs in their native language, both by freely offering words and by choosing the best fitting emotion word from a list of nine choices, they did not label the facial configurations as proposed by Ekman and colleagues at above chance levels (Crivelli et al., 2017).23 In fact, the proposed fear expression -- the wide-eyed gasping face -- is actually interpreted as an expression of threat (intent to harm) and anger by the Maori of New Zealand and in the Trobriand Islanders in remote Papua New Guinea ().

A compendium of spontaneous human behavior published by the Austrian ethologist Irenäus Eibl-Eibesfeldt (Eibl-Eibesfeldt, 1989) is sometimes cited as evidence for the hypothesis that certain facial movements are universal signals for specific emotion categories. No systematic coding procedure was used in his investigations, however. Upon close examination, Eibl-Eibesfeldt’s detailed descriptions appear to be more consistent with the studies of people from more industrialized cultures that we reviewed above: people move their faces in a variety of ways during episodes belonging to the same emotion category. For example, as reported by Eibl-Eibesfeldt, a rapid eyebrow raise (called an eyebrow flash) is thought to express friendly recognition in some, but not all, cultures. This movement would be coded with FACS AU 1 (inner brow raise) and AU 2 (outer brow raise) that are part of the proposed expressions for surprise and fear (Ekman et al., 1983), sympathy () and awe (Shiota et al., 2003). Even Eibl-Eibesfeldt acknowledged that eyebrow flashes were not unique expressions of specific emotion categories, writing that they also served as a greeting, to invite social contact, as a sign of thanks, an initiation of flirting, and a general indication of “yes” in Samoans and other Polynesians, in the Eipo and Trobriand islanders in Papua New Guinea, and in the Yanomami of South America. In Japan, eyebrow flashes are considered an impolite way for adults to greet one another. In the US and Europe, an eyebrow flash was observed when friends greet one another, but not strangers.

Posed facial movements.

One study read a brief emotion story to people who live in the remote Fore culture of Papua New Guinea and asked each person to “show how his face would appear” if he was the person described in the emotion stories (Ekman, 1972, p. 273; sample size was not reported). Videotapes of nine participants were shown to 34 US college students who were asked to judge which emotion was being expressed. US participants were asked to infer the emotional meaning of the facial poses by choosing an emotion word from six choices provided by the experimenter (called a choice-from-array task, see Table 5). Participants inferred the intended emotional meaning above chance guessing for smiling (happiness, 73%), frowning (sadness, 68%), scowling (anger, 51%), and nose-wrinkling (disgust, 46%), but not for surprise and fear (27% and 18% respectively).

Table 5:

Common tasks for measuring explicit emotion perception

ConcernsAdditional Observations
General Considerations
Test-retest reliability is rarely evaluated but is critical. A number of contextual factors are known to influence judgments, including a perceiver’s internal state.Test-retest assessments are rarely done for practical reasons.
Participants are typically asked to infer emotional meaning in exaggerated facial configurations. This reduces the ecological validity of the findings for how people infer emotional meaning in faces in the real world. The facial configurations used in most experiments (see Figure 4) are caricatures – they are exaggerated to maximally distinguish one from the another. Caricatures are easier to label (categorize) than are typical stimuli, particularly when the categories in question are highly interrelated ().Exaggerated facial configurations have greater “source clarity” ()
Participants are typically asked to infer emotional meaning in highly selected facial configurations.In early studies, a smaller set of exaggerated facial configurations were culled from much larger sets of posed faces (involving several thousand faces; for a discussion, see ; Russell, 1994).
Participants are typically asked to infer emotional meaning in static, non-moving facial configurations (i.e., in photographs rather than movies). This reduces the ecological validity of the findings for how people infer emotional meaning in faces in the real world. In the real world, people have to infer when a set of movements begin and end; this is called discrimination or detection).There is information in the dynamics of facial movements (; ), but dynamic facial movements, particularly when they are spontaneous, do not always produce higher levels of agreement in emotion perception studies. Dynamic movements add realism, intensity and improve levels of agreement primarily when movements are degraded or are artificial
Participants are typically asked to infer emotional meaning in posed, rather than spontaneous, facial configurations.Spontaneous or candid facial configurations typically produce much lower levels of agreement in emotion perception studies (e.g., ; ).
Only a single task used in most experiments (i.e., participants are asked to infer emotion in facial configurations via one method of responding). Ideally, multiple tasks should be used with the same population of participants to see if convergent results are obtained.This approach is rarely taken, but for an example, see Crivelli et al., 2016; Gendron et al., 2014; Gendron et al., 2018).
Most experiments ask participants to infer emotion in a disembodied face, alone, without context. This reduces the ecological validity of the findings for how people infer emotional meaning in faces in the real world.A growing number of experiments now show that context is an important, and sometimes dominant, source of information when people infer emotional meaning in a facial configuration. See Box 3 in SOM. For example, Situational information tends to dominate perception of emotion in faces both when situations are common, everyday () and even when situations are more ambiguous than the exaggerated facial configurations being judged (, Study 3).
Many studies do not report evidence about the specificity of emotion perceptions, or the frequency with which people infer the non-intended emotional meaning to a facial configuration.
Until recently, the large majority of experiments included only one pleasant emotion category (happiness) among several unpleasant emotion categories (anger, fear, sadness, etc.). This may be one reason that agreement rates are so high for smiles.In the last few years, experiments are now including a larger variety of pleasant emotion categories (pride, awe, gratitude, etc.), but there continues to be debate over whether or not they expect these emotion categories are expressed with consistent, specific facial configurations.
Choice-From Array: matching photos of facial configurations and emotion words (with or without brief stories)
Response options are limited to those provided in the task
Words influence how the brain processes visual inputs from faces (e.g., Gendron et al., 2012; ). Stories can prime action perceptions, as well (Gendron et al., in press). More generally, choice-from-array tasks have been shown to encourage biased perceptual responding using a signal detection analysis (e.g., DeCarlo 2012).Choice-from-array tasks are easy and efficient.
The fact that participants are exposed to the same facial configurations and emotion words over and over allows them to learn the intended pairings even if they don’t know them to begin with ().
An emotion word does not necessarily have a unique correspondence to a single emotion category for all people in a given culture (i.e., they may differ in emotional granularity; Barrett, 2004, 2017; ) or people from different cultures.Concerns about individual word meaning is why choice-from-array using stories is preferable. Also, choice-from-array tasks are usually straightforward for participants to understand.
A small range of answers are pre-determined by the experimenter, making it easier for participants to provide the answers scientists expect. For example, by constraining which words participants were allowed to choose from, frowns were consensually labeled as fear, wide-eyed gasping faces were labeled as surprise (Russell, 1993). Scowling faces are more likely to be perceived as fearful when paired with the description of danger (, Study 1) and appear determined or puzzled depending on the story they are presented with (, Study 2).Choice-from-array responses are easy for scientists to score. Most studies using continuous judgments (rather than forced choice) find that participants do not infer emotional meaning in facial configurations in a yes/no or on/off sort of way (Russell, 1994).
People are asked to make yes/no decisions about assigning a facial configuration to an emotion category. Multiple emotion words may apply to a single configuration (i.e., people might infer more than one emotional meaning in a face), but the option to infer multiple emotional meanings rarely given to participants.Continuous judgments, such as on a Likert-type scale ranging from one to seven, would solve both of these problems, and also allow analysis of the similarity among facial configurations (which evidence shows is important, e.g., Jack et al., 2016; ). Similarity allows scientists to discover the emotional meanings that people implicitly assign to a facial configuration, rather than having people explicitly state them (see further discussion of similarity below).
A participant might decide that no emotion word provided applies to a facial configuration, but the option to respond this way is rarely given to participants (they are usually forced to choose an emotion word; for discussion, see ).See Cordaro et al. (2016) for an example of this design feature.
If a participant hears a story and is choose between two faces (e.g., a scowl and smile), she can give the expected answer (e.g., scowl) simply by figuring out that smile is NOT correct. For example, after hearing a story about anger, a participant is shown a scowl and a smile and can choose the scowl merely by realizing the smile is not correct (on the basis of valence). This is similar to getting an answer right on a multiple-choice test by eliminating all the alternatives—you don’t actually know the right answer, but you figured it out because of the structure of the task. A similar point can be made about showing a single face and asking participants to label it with a word by selecting from among a small set of options. Participants use a process of elimination strategy: words that are not chosen on prior trials are selected more frequently, inflating agreement levels ().If a participant hears a story about anger and must choose between a scowl and a smile, she can figure out that the scowl is correct merely because she is distinguishing between negative (scowl) and positive (smile). If a participant hears a story about anger and must choose between a scowl and a frown, he can figure out that the scowl is correct merely because he is distinguishing between high arousal (scowl) and low arousal (frown).
In tasks that involve brief stories or vignettes about emotion, only one typical story is offered for each emotion category, making it more difficult to observe any variation within a category.
Free Sorting: photos of facial configurations are sorted into groupings, such that each grouping represents a category
Face-to-Cue Matching: matching photos of facial configurations to a recording of posed vocalization
Most participants still spontaneously use words to guide their sorting and organize their groupings.Ideal for preverbal participants or those with semantic deficits (e.g., Lindquist et al., 2014).
Similarity Judgments Between Pairs of facial configurations
Perceptual Matching: Indicating whether or not two photos of facial configurations belong to the same emotion category
It is inefficient and time consuming to judge the similarity of all pairs of facial configurations. For a set of 100 faces, this requires (100*100)/2 = 5,000 different similarity judgments.Participants can arrange face stimuli on a computer screen and all pairwise similarity judgments can be computed (the SPAM method proposed by Goldstone, 1994; e.g., see Hout et al.,2013). This procedure also solves the problem that the same pair of stimuli will have a different judged similarity depending on which item is presented first if face pairs are presented sequentially presented faces (the judged similarity of two objects, A and B, can depend on the order in which they are presented; the similarity of A vs B is not always judged to be the same as B vs A; Tversky, 1977). Other advantages are that categories can be discovered, rather than prescribed, and verbal associations are minimized. Analyses of similarity judgments typically yield more continuous similarity relations between emotion categories along affective dimensions (see ).
Free-Labeling: photos of facial configurations are labeled with words offered by participants (unconstrained by experimenter)
Forcing people to translate faces into words is not a good match, since much of the information from faces cannot be easily captured in words (Ekman, 1994).This is not a special criticism of free labeling studies -- it applies to all studies that ask people to label a face with words, including the choice-from-array tasks.
Facial expressions did not evolve to represent specific verbal labels (Ekman, 1994, p. 270).“Regardless of the language, of whether the culture is Western or Eastern, industrialized or preliterate, these facial expressions are labeled with the same emotion terms: happiness, sadness, anger, fear, disgust, and surprise" (Ekman, 1972, p. 278).
There is no widely accepted method for categorizing freely provided responses. (Ekman, 1994, p. 274).Most scientists group together similar words (synonyms), so that a variety of words can be used to show evidence of a correct response (e.g., a frowning face, which is the proposed expression for sadness, could be labeled as "sad," "grieving," "disappointed," "blue," "despairing," and so on. Scientists routinely use databases that indicate synonyms, like WORDNET (used in . Also, it is possible to do data-driven groupings of emotion words into semantic categories (e.g., Jack et al., 2016; Shaver et al., 1987). The more serious problem is that early studies using free-labeling (e.g., ; Izard, 1971) did not provide enough information in the method sections about how freely provided labels were grouped.
Using freely chosen labels in a study of different cultures is difficult because it may be hard to find adequate translations (Ekman, 1994, p. 274). A given emotion word, like sadness, can correspond to different emotion concepts (with different features) in different languages (e.g., Wierzbicka, 1986, 2014). A single emotion word in one language can refer to more than one concept in another language (e.g., Pavlenko, 2014). Some languages have no one-to-one translation for English emotion words and some emotion concepts in other languages are not directly translatable into English emotion words (see Barrett, 2017; Russell, 1991; Jack et al., 2016).This is not a special criticism of free labeling studies – it holds for any experiment that uses emotion words requiring translation, including choice-from-array tasks. A standard solution to this problem is to use both forward and backward translation (e.g., a word spoken in Hadzane is translated into English and then back translated into Hadzane; if there is no broken telephone, then the translation has fidelity). An even better method is to elicit features for the emotion words in question, including typicality of those features, to determine the fidelity of translation (e.g., de Mendoza et al., 2010) Scientifically, issues with translation are manageable if scientists allow phrases to stand in for specific words.
Using only single words will always fail to capture much of the rich information in faces.Participants often provide multiple words or even longer descriptions of situations, behaviors, or behaviors in situations (e.g., see Gendron et al., 2014; Russell, 1994). Such data are time consuming to code and analyze.
Even when participants are told that photographs are of people trying to express an emotion, they often offer non-emotion labels. For example, Izard (1971) found that people offered labels such as deliberating, clowning, skepticism, pain, and so on (as reported in Russell, 1994).This is not necessarily evidence that participants did not understand the task asked of them. It might be evidence that these facial configurations are not specific for expressing emotions.

Note. Response tasks are arrayed in order from those that constrain participants’ responses most, making it difficult to observe evidence that can disconfirm commonsense beliefs about emotion to those that are least constrained, making it easier to observe variation and disconfirm commonsense beliefs. Choice-from-array = participants are shown a facial configuration and asked to infer its emotional meaning by choosing an emotion word from a small set of words; or, participants presented with an emotion word that labels an emotion category (e.g., sadness) or a brief story about a typical instance of an emotion category (e.g. “the boy’s much loved dog just died and he is sad”) along with two or three photographs of faces (typically posed into one of the configurations presented in Figure 4) and then asked to choose the facial configuration that they judge best matches the emotional episode described in the word or vignette. Typically, each emotion category is represented by a single scenario. Free sorting = Participants are given photographs of facial configurations and asked to sort them into emotion categories by piles on a big table or on a computer screen. Pairwise similarity judgments = participants rate the similarity of all possible pairs of face stimuli (e.g., on a scale of 0-6). For detailed design concerns about choice-from-array tasks, see Russell (1994, 1995).


Our review of the available evidence from expression production studies in small-scale, remote cultures is inconclusive because there are no systematic, controlled observations that examine how people who live in these cultural contexts spontaneously move their facial muscles during emotional episodes. The evidence that does exist suggests that common beliefs about emotion may share some similarities across urban and small-scale cultural contexts, but more research is needed before any interpretations are warranted. These findings are summarized in the fourth and fifth data rows of Table 4.

Studies of Healthy Infants and Children

The facial movements of infants and young children provide a valuable way to test common beliefs about emotional expressions because, unlike older children and adults, babies cannot exert voluntary control over their spontaneous expressive behaviors, meaning that they are unable to deliberately mask or portray instances of emotion in accordance with social demands. As a general rule, infants understand far more about the world than what they can easily convey through their physical actions, making it difficult for experiments to distinguish between what infants understand, which often exceeds what they can actually do. Experiments must use human inference to determine when an infant is in an emotional state, as is the case in studies of adults (see Human inference and assessing the presence of an emotional state). The presence (or absence) of an instance of emotion is inferred (i.e., stipulated), either by a scientist (who exposes a child to something that is presumed to evoke an emotion episode) or by adult “raters” who infer the emotional meaning of the evoking situation or the child’s body movements and vocalizations (see Subjective measures of an emotional instance). In the latter cases, inferences are measured by asking research participants to label the situation or the child’s emotional state by choosing an emotion word or image from a small set of options, a task known as choice-from-array. We address the strengths and weaknesses of choice-from-array tasks (see Table 6) and the potential risk of confirmatory bias with the use of such methods (see Some observations on interpreting the data, below).

Table 6:

Culturally common facial configurations discovered using the reverse correlation method

Associated Emotion Words –
Associated Emotion Words
- China

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (39)

6+12+13+14delighted, joy, happy, cheerful, contempt, pridejoyful, delighted, happy, glad, feel well, pleasantly surprised, embarrassed, pride

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (40)

4+20+24+43fear, scared, anxious, upset, miserable, sad, depressed, shame, embarrassedafraid, anxious, distressed, broken-hearted, sorrow and sadness, having a hard time, grief, dismay, anguish, worry, vexed, unhappy, shame, despise

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (41)

2+5+26+27ecstatic, excited, surprised, frightened, terrifiedamazed, greatly surprised, alarmed and panicky, scared, fear

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (42)

7+9+16+22hate, disgust, fury, rage, angerdisgusted, bristle with anger, furious, wild wrath, storm of fury, storm of anger, indignant, rage

Note. Facial configurations extracted using reverse correlation from 62 models of facial configurations. Red coloring indicates stronger AU presence and blue indicates weakest AU presence. Some words and phrases that refer to emotion categories in Chinese are not considered emotion categories in English. Modified from Jack et al. (2016) and reproduced with permission.

With such a strong reliance on human inference, there is a risk that scientists will implicitly confound the measurements made in an experiment with their interpretation of those measurements, in effect over-interpreting infant behavior as reflecting a specific aspect of an emotional event, in part because these young research participants cannot speak for themselves. Some early and influential studies confound the observation of facial movements with their interpreted emotional meaning, leading to the conclusions that babies as young as 7-months of age were capable of producing an expression of anger when, in fact, it is more scientifically correct to say that the babies were scowling. For example, in one study, infants’ facial movements were coded as they were given a cookie, and then the cookie was taken away and placed out of reach although still clearly visible. The babies appeared to scowl when the cookie was removed and not when it was in their mouths (). It is certainly possible that this repeated giving and taking away of the treat angered the infants, but the babies might also have been confused or just generally distressed. Without some independent evidence to indicate that a state of anger was induced, we cannot confidently conclude that certain facial movements in an infant reliably express a specific instance of emotion.

The Stenberg et al. study illustrates some of the design issues that have historically been of concern in many studies with infants. First, emotion-inducing situations are often defined with commonsense intuitions rather than objective evidence (e.g., an infant is assumed to become angry when a cookie is taken away). In fact, it is difficult to know how any individual infant at any point in time will construct and react to such an event. Second, when an infant produces a facial movement, a common assumption is used to infer its emotional meaning without additional measures or controls (e.g., when a scowling facial configuration is observed, it is assumed to necessarily be an expression of infant anger, even if there are no data to confirm that a scowl is specific to instances of anger in an infant). In fact, years later, Campos and his team revised their earlier interpretation of their findings as their research program progressed, later concluding that the facial movements in question (infants lowering and drawing together their brows, staring straight ahead, or pressing their lips together) were more generally associated with unpleasantness and distress, and were not reliable expressions of anger (e.g., Camras, Oster et al., 2007).

The inference problem is particularly poignant when fetuses are studied. For example, a study that used 4-D ultrasonography observed 20-week-old fetuses knitting their brows and described the facial movements as expressions of distress (Dondi et al., 2014). Yet the fetuses were producing these facial movements during situations when fetal distress was unlikely. The brow-knitting was observed during noninvasive ultrasound scanning that did not involve perturbation of the fetus and the pregnant women were at rest. Furthermore, the scans were brief in duration and the facial movements were interspersed with other movements that are typically not thought to express negative emotions, such as smiling and mouthing. This is an example of making a scientific inference about an emotion occurring based solely on the facial movements without converging evidence that the organism in question (a fetus) was in a distressed state. Doing so highlights the common but unsound assumption that certain facial movements reliably index instances of the same emotion category.

The study of expression production in infants and children must deal with other design challenges, in addition to the reliance on human inference, that are shared by experiments employing adult participants. In particular, most experiments observe facial movements in a restricted range of laboratory settings rather than in the wide variety of situations that naturally occur in everyday life. The frequent use of only a single stimulus or event to observe facial movements for each emotion category limits the opportunity to discover whether the expression of an emotion category vary systematically with context.

Even with these design considerations, the scientific findings from studies of infants and children parallel those that we encountered from studies on adults: lack of reliability and specificity in facial muscle movements is the norm, not the exception (again, according to the criteria in Table 2). Although some older studies concluded that infants produce invariant emotional expressions (e.g., ; Izard et al., 1987; Izard et al., 1995; ), these conclusions have been largely overturned by more recent work and in many cases have been reinterpreted and revised by the authors themselves (e.g., )..

Facial movement in fetuses, infants and young children.

The most detailed research on facial movements in fetuses and newborns has focused on smiles. Human fetuses lower their brows (AU4), raise their cheeks (AU6), wrinkle their noses (AU9), crease their nasolabia (AU11), pull the corners of their lips (AU12), show their tongues (AU19), part their lips (AU25), and stretch their mouths (AU27) -- all of which have been implicated, to some degree, in adult laughter. Infants sometimes produce facial movements that resemble adult laughter when they are in distress and pain (Dondi et al., 2014; Hata et al., 2013; Reissland et al., 2011; ; Yan et al., 2006). Within 24 hours of birth, infants raise their cheek muscles in response to being touched (Cecchini et al., 2011). But these movements are not specific to smiling; neonates also raise their cheeks (contract the zygomatic muscle) during rapid eye movement (REM) sleep, when drowsy, and during active sleep (Dondi et al., 2007). A neonatal smile with raised cheeks is caused by brainstem activation (Rinn, 1984), reflecting internally generated arousal rather than expressing or communicating an emotion or even a more general feeling of pleasure (; Sroufe, 1996; Wolff, 1987). So, it remains unclear whether fetal or neonatal facial muscle movements have any relationship to specific emotional episodes, as well as more generally to pleasant feelings or to other social meanings (Messinger, 2002).

In fact, it’s not clear that fetal and neonatal facial movements always have a psychological meaning (consistent with a behavioral ecology view of facial movements; Fridlund, 2017). Newborns appear to produce some combinations of facial movements for muscular reasons. For example, infants produce facial movements associated with the proposed expression for “surprise” (open mouth and raised eyebrows) in situations that are unsurprising, just because opening the mouth necessarily raises their eyebrows; conversely, infants do not consistently show the proposed expressive configuration for surprise in contexts that are likely to be surprising (Camras, 1992; Camras et al., 2017). The facial movement that is part of the proposed expression for sadness (brows oblique and drawn together) occurs when infants attempt to lift their heads to direct their gaze ().

In addition, newborns produce many facial movements that co-occur with fussiness, distress, focused attention, and distaste (Oster, 2005). Newborns react to being given sweet versus sour liquids; for example, newborns make a nose-wrinkle movement, which is part of the proposed expressive configuration for disgust, when given a sour liquid (Granchrow et al., 1983). However, other studies show that newborns also make this facial movement when given sweet, salty, sour and bitter tastes (e.g., ). Still other studies show that nose-wrinkling does not always occur when infants taste lemon juice (i.e., when that facial movement is expected; Bennett et al., 2002). More generally, infants rarely produce consistent facial movements that cleanly map onto any single emotion category. Instead, infants produce a variety of facial configurations, indicating a lack of emotional specificity ().

There are further examples that illustrate how infant facial movements lack strong reliability and specificity. In a study of 11-month old babies from the US, China and Japan, infants saw a toy gorilla head that growled (to induce fear) or their arms were restrained (to induce anger; Camras et al., 2007). Observers judged the infants to be fearful or angry based on their body movements; yet, the infants produced the same facial movements in the two situations.24 In another study, one-year-old infants were videotaped in situations where they were tickled (to elicit joy), tasted sour flavors (to elicit disgust), watched a jack-in-the box (to elicit surprise), had their arm restrained (to elicit anger), and were approached by a masked stranger (to elicit fear) (Bennett, Bendarsky, and Lewis, 2002). Infants whose arms were restrained (to purportedly induce an instance of anger) produced the facial actions associated with the proposed facial configuration for an anger expression only 24 percent of the time (low reliability), and instead 80 infants (54%) produced the facial actions proposed as the expression of surprise, 37 infants (25%) produced the facial actions proposed as the expression of joy, 29 infants (19%) produced the facial actions proposed as the expression of fear, and 28 (18%) produced the facial actions proposed as the expression of sadness. This dramatic lack of specificity was observed for all emotion categories studied. An equal number of babies produced facial movements that are proposed as the expressions of joy, surprise, anger, disgust, and fear categories when a sour liquid was placed on infants’ tongues to elicit disgust. When infants faced a masked stranger, only 20 (13%) produced facial movements that correspond to the proposed expression for fear, compared to 56 infants (37%) who produced facial actions associated with the proposed expression for instances of joy.25

Taken together, these findings suggest that infant facial movements may be associated with the affective features of experience, such as distress or arousal, as originally described by Bridges (1932), or communicate a desire to approach or avoid something (e.g., ). Affective features such as valence (ranging from pleasantness to distress) and arousal (ranging from activated to quiescent) are continuous properties of consciousness, just as approach and avoidance are continuous properties of action. These affective features are shared by many instances of different emotion categories, as well as with mental events that are not considered emotional (as discussed in Box 9, in SOM) but are still effective and important for infants.26 Over time, infants likely learn to differentiate mental events with simple affective features into episodes of emotion with additional psychological features that are specific to their socio-cultural contexts, making them maximally effective at eliciting needed responses from their caregivers (Barrett, 2017a; ; ; Witherington et al., 2008).

The affective meaning of an infant’s facial movements may, in fact, be the very properties that make these movements so salient to adult observers. When infants move their lips, open their mouths, or constrict their eyes, adults view infants as feeling more positively or negatively depending upon the context (Bolzani et al., 2005). Infant expressions thus do have a reliable link to instrumental effects in the adults who observe them – playing an important role in parent-infant interaction, attachment and the beginnings of social communication (Atzil et al., 2018; Feldman, 2016). For example, if an infant cries with narrowed eyes, adults rate that infant’s emotion as more negative or having an unwanted experience or needing help, but if the infant makes that same eye movement while smiling, adults interpret the infant as experiencing more positive emotion. These data consistently point to the usefulness of facial movements in the communication of arousal and valence (properties of affect; Box 9, SOM). Even when episodes of more specific emotions start to emerge, we don’t yet have evidence that facial movements map reliably and regularly to a specific emotion category.

Young children begin to produce adult-like facial configurations after the first year of life. Even then, however, children’s facial movements continue to lack strong reliability and specificity (Bennett et al., 2002; ; ; Oster, 2005). Examples of a wide-eyed gasping facial configuration, proposed as the expression of fear (see Figure 4), have rarely been observed or reported in young infants (Witherington et al., 2010). Nor do infants reliably produce a scowling facial configuration, proposed as the expression of anger (again, see Figure 4). Infants scowl when they cry or are about to cry (). A frown (mouth corner depression, AU15) is not reliably and specifically observed when infants are frustrated (; Sullivan et al., 2003). A smile (cheek raising and lip corner pulling, AU6 and AU12) is not reliably observed when infants are in visually engaging or mastery situations, or even when they are in pleasant social interactions (Messinger, 2002).

Experiments that observe young children’s facial movements in naturalistic settings find largely the same results as those conducted in controlled laboratory settings. For example, one study trained ethnographic videographers to record a family’s daily activities over four days (Sears et al., 2014). Coders judged whether or not the child from each participating family made a scowling facial configuration (referred to as an expression of anger), a frowning facial configuration (referred to as an expression of sadness), and so on, for the six (presumed) emotion categories included in the study -- happiness, sadness, surprise, disgust, fear, and anger. During instances that were coded as anger (defined as situations that included verbal disagreements/sibling bickering, requests for compliance and/or reprimands from parents, parent refusal of child requests, during homework, and sibling provocation), a variety of facial movements were observed, including frowns, furrowed brows, and eye-rolls, as well as a variety of vocalizations, including shouts and whining, and both nonaggressive and aggressive physical behaviors. Perhaps the most telling observations for our purposes is that expressions of anger were more often vocal than facial. During anger situations, children raised their voices 42% of the time, followed by whining about 21% of the time. By contrast, children made scowling facial configurations only 16.2% of the time.27 Yet even during anger situations, the facial movements were predominantly frowning, which can be part of many different proposed facial configurations. The authors reasoned that children engage in specific behaviors to obtain specific goals, and that behaviors such as whining are more likely to attract attention and possibly change parental behavior than will a facial movement. Indeed, it is easier for parents to ignore a negative facial expression than a whining child in the room! Similar findings for low reliability and specificity of the facial configurations presented in Figure 4 were recently observed in a naturalistic study that videotaped seven to nine-year old children and their mothers discussing a conflict during their visit to the laboratory related to homework, chores, bedtime or interactions with siblings (Castro et al., 2017).


Newborns and infants react to the world around them with facial movements. There is not yet sufficient evidence, however, to conclude that these facial movements reliably and specifically express the instances of any specific emotion category (findings summarized in Table 4). When considered alongside vocalizations and body movements, there is consistent evidence that infant facial movements reliably signal distress, interest and arousal, and perhaps serve as a call for help and comfort. In young children, instances of the same emotion category appear to be expressed by a variety of different muscle movements, and the same muscle movements occur during instances of various emotion categories, and even during non-emotional instances. It may be the case that reliability and specificity emerges through learning and development (see Box 10, in SOM), but this remains an open question that awaits future research.

Studies of Congenitally Blind Individuals

Another source of evidence to test the common view comes from observations of facial movements in people who were born blind. The assumption is that people who are blind cannot learn, by watching others, which facial muscles to move when expressing emotion. Based on this assumption, several studies have claimed to find evidence that congenitally blind individuals express emotions with the hypothesized facial configurations in Figure 4 (e.g., blind athletes show expressions that are reliably interpreted as shame and pride, ; see also ). People who are born blind learn through other sensory modalities, however (for a review, see ), and therefore can learn whatever regularities exist between emotional states and words for facial movements from hearing descriptions in conversation, in books and movies, and by direct instruction.28 As an example of such learning, Olympic athletes who win medals smile only when they know they are watched by other people, such as when they are on the podium facing the audience; in other situations, such as while waiting behind the podium or while on the podium facing away from people but towards a flag, they did not smile (but presumably were still very happy; Fernandez-Dols et al., 1995). Such findings are consistent with the behavioral ecology view of facial expressions, Fridlund, 1991, 2017) and with more recent sociological evidence that smiles are social cues that can communicate different social messages depending on the cultural context (Martin, Rychlowska, Wood and Niedenthal, 2017).

The limitations that apply to studies of emotional expressions in sighted individuals, reviewed throughout this paper, are even more applicable to scientific studies of emotional expressions in the blind.29 Participants are given pre-determined emotion categories that shape their possible responses, and facial movements are often quantified by human judges who have their own biases when making commonsense judgments (e.g., Galati et al., 1997; Galati et al., 2001; Valente et al., 2017). In addition, people who are blind make additional, often unusual movements of the head and the eyes (Chiesa et al., 2015). For example, people who are blind from birth often move their head in unusual ways to better hear objects or echoes. These unusual movements might interfere with or contaminate expressive facial movements. More importantly, they reveal whether a participant is blind or sighted, and this knowledge can bias human raters who are judging the presence or absence of facial movements in emotional situations.

Helpful insights about the facial expressions of congenitally blind individuals comes from a recent review (Valente et al., 2017) that surveyed 21 studies published between 1932 and 2015. These studies observe how blind participants move their faces during instances of emotion and then compared those movements both to the proposed expressive forms in Figure 4 and to the facial movements of sighted people. Both spontaneous facial movements and posed movements were tested. Eight older studies (published between 1932-1977) reported that congenitally blind individuals spontaneously expressed emotions with the proposed facial configurations in Figure 4, but Valente et al. (correctly) questioned the objectivity of these studies because the data were largely based on subjective impressions offered by researchers or their assistants. The 13 studies published between 1980 and 2015 were better designed: they videotaped participants’ facial movements and described them using a formal facial coding system like FACS for adults or a similar coding system for children. These studies are too few in number and have insufficient sample sizes to conduct a formal meta-analysis, but taken together they suggest that, in general, congenitally blind individuals spontaneously moved their faces in similar ways to sighted individuals during instances of emotion: both groups expressed instances of anger, disgust, fear, happiness, sadness or surprise with the proposed expressive configurations (or their individual AUs) in Figure 4 with either weak reliability or no reliability at all, and neither group produced any of the configurations with any specificity (e.g., Galati et al., 2001; Galati et al., 2003; Galati et al., 1997). The lack of specificity is not surprising given that, upon closer inspection, several of the studies discussed in Valente et al. (2017) compared emotion categories that systematically differ in their prototypical affective properties, contrasting facial movements in pleasant vs. unpleasant circ*mstances (e.g., Cole et al., 1989), or observed facial movements only in pleasant circ*mstances without distinguishing the facial AUs for the happiness category vs. other positive emotion categories (e.g., Chiesa et al., 2015), such that their findings cannot be interpreted unambiguously as evidence pertaining to emotional expressions, per se.

While congenitally blind and sighted individuals were similar to one another in the variety of their spontaneous facial movements, they differed in their posed facial configurations. After listening to descriptions of situations that were supposed to elicit an instance of anger, sadness, fear, disgust, surprise, and happiness, sited participants posed their faces with the proposed expressive forms for the negative emotion categories in Figure 4 at higher levels of reliability and specificity than did blind participants (Galati et al., 1997; Roch-Levecq, 2006). These findings suggest that congenitally blind individuals have different beliefs about emotional expressions or that their knowledge of social rules for producing those configurations on command differs from those of sighted individuals.


The evidence from studies of blind individuals is consistent with the other scientific evidence reviewed so far (Table 4). Even in the absence of visual experience, blind individuals, like sighted individuals, develop the ability to spontaneously make a variety of facial movements to express emotion, and those movements do not reliably and specifically configure as proposed by the common view of emotion (depicted in Figure 4). Learning to voluntarily pose the proposed expressions in Figure 4 does seem to covary with vision, however, further emphasizing that posed and spontaneous expressions should be treated as different phenomena. Further scientific attention is warranted to examine how congenitally blind individuals learn, via other sensory modalities, to express emotions.

Summary of Scientific Evidence on the Production of Facial Expressions

The scientific findings we have reviewed thus far – dealing with how people actually move their faces during emotional events – does not strongly support the common view that people reliably and specifically express instances of emotion categories with spontaneous facial configurations that resemble those proposed in Figure 4. Adults around the world, infants and children and congenitally blind individuals all show much more variability than commonly hypothesized. Studies of posed expressions further suggest that particular facial movements are linked to particular emotions more by consensus and beliefs, rather than by scientific evidence for “emotion expression.” Consequently, the commonly used phrases such as “emotional facial expression,” “emotional expression” or “emotional display” are misleading. More neutral phrases that assume less, such as “facial configuration” or “pattern of facial movements” or even “facial actions,” should be used instead.

We next turn our attention to the question of whether people reliably and specifically infer certain emotions from certain patterns of facial movements, shifting our focus from studies of production to studies of perception. It has been long assumed that emotion perception provide an indirect way of testing the common view of emotion production, because facial expressions, when they are assumed to be displays of internal emotional states, are thought to have co-evolved with the ability to recognize and read them (). For example, Shariff and Tracy (2011) have suggested that emotional expression (production) and emotion perception likely co-evolved as an integrated signaling system (for additional discussion, see ).30 In the next section, we review the scientific evidence on emotion perception.

Perceiving Emotions from Facial Movements: A Review of the Scientific Evidence

For over a century, an active line of research has directly examined whether people reliably and specifically infer emotional meaning in the facial configurations presented in Figure 4. Most of these studies are interpreted as evidence for people’s ability to recognizeordecode emotion in facial configurations, on the assumption that the configurations broadcast or signal emotional information to be recognized or detected. This is yet another example of confusing what is known and what is being tested. A more correct interpretation is that these studies indicate whether people reliably and specifically infer or judge emotion in those facial configurations. This pervasive confusion in the scientific literature may explain why very few studies have actually investigated the processes by which people detect the onset and offset of facial movements and infer emotions in those movements (i.e., few studies consider the mechanisms by which people infer emotional states from detecting and perceiving facial movements) (for discussion, see Martinez, 2017a, 2017b). In this section, we first review the design of typical emotion perception experiments that are used to test the common view that emotions can be reliably and specifically “read out” from facial movements. We also examine whether people infer emotions from the facial movements in dynamic, computer-generated faces, a class of studies that offer a more data-driven way to study emotion perception, and in virtual humans, which provides the opportunity for a more implicit approach to studying emotion perception.

The Anatomy of a Typical Experiment Designed to Observe Whether People Reliably and Specifically Infer Emotion in Facial Movements

For a person - a perceiver -- to infer that another person is in an emotional state by looking at that person’s facial movements, the perceiver must have many competencies. People move their faces continuously (i.e., real human faces are never still), so a perceiver must notice or detect the relevant facial movements in question and discriminate them from other facial movements (that is, the perceiver must be able to set a perceptual boundary to know when the movements begin and end, and, for example, that a scowl is different from a sneer). The perceiver must be able to identify (or segment) the movements as an ensemble or pattern (i.e., bind them together and distinguish them from other movements that are normally inferred to be irrelevant). And the perceiver must be able to infer similarities and differences between different instances of facial movements, as specified by the task (e.g., categorize a group of facial movements as instances expressing anger, fear, etc.). This categorization might involve merely labeling the facial movements, referred to as action identification (describing how a face is moving, such as smiling) or it might involve inferring that a particular mental state caused the actions, referred to as mental inference or mentalizing (inferring why the action is performed, such as a state of happiness; ). In principle, the categorization could also involve inferring a situational cause for the actions, but in practice, this question is rarely investigated in studies of emotion perception. The overwhelming majority of studies ask participants to make mental inferences, although as we discuss later in this section, there appears to be important cultural variation in whether emotions are perceived as situated actions vs. as mental states that cause actions.

The use of posed configurations of facial movements in assessments of emotion perception.

The majority of the experiments that study emotion perception ask participants to infer emotion in facial configurations that are posed by actors who are not in an emotional state when the photos were taken or by computer-generated humans who have no actual emotional state. As a consequence, it is not possible to assess the accuracy (i.e., validity) of perceivers’ emotional inferences and, correspondingly, data from emotion perception studies cannot be interpreted as support for the validity of common beliefs about emotional expressions. As is the case in studies of expression production, it is more appropriate to interpret participants’ responses in terms of their agreement (or consensus) with common beliefs. Even more serious is the fact that the proposed expressive facial configurations in Figure 4 do not capture the wider range of muscle movements that are observed when people express instances of these emotion categories. A recent study that mined over seven million images from the internet (for method, see Box 7 in SOM; ) identified multiple facial configurations associated with the same emotion category label and their synonyms --17 distinct facial configurations were associated with the word “happiness,” five with “anger,” four with “sadness,” four with “surprised,” two with “fear,” and one with “disgust.” The different facial configurations associated with each emotion word were more than mere variations on a universal core expression – they were distinctive sets of facial movements.31

Measuring emotion perception.

The typical emotion perception experiment takes one of several forms, summarized in Table 5. Choice-from-array tasks, in which participants are asked to match photos of facial configurations and emotion words (with or without brief stories), have dominated the study of emotion perception since the 1970s. For example, a meta-analysis of emotion perception studies published in 2002 summarized 87 studies, 83 (95%) of which exclusively used a choice-from-array response method (). This method has been widely criticized for over two decades, however, because they limit the possibility of observing evidence that could disconfirm the common view. Participants are strongly constrained in how they can infer meaning in a facial configuration, such as a photograph of a scowling facial configuration, since their choices are constrained to the options provided in the experiment (usually a small number of emotion words). In fact, the preponderance of choice-from-array tasks in the scientific study of emotion perception has been identified as one important factor that has helped perpetuate and sustain the common view (Russell, 1994). Other tasks exist for assessing emotion perception (see Table 5), including those that use a free-labeling method, where participants are asked to freely nominate words to label photographs of posed facial configurations, rather than choosing a word from a small set of predefined options. For example, upon viewing a scowling configuration, participants might offer words like “angry,” “sad,” “confused,” “hungry,” or even “wanting to avoid a social interaction.” By allowing participants more freedom in how they infer meaning in a facial configuration, free-labeling makes it equally possible to observe evidence that could either support or disconfirm the common view.

Recent innovations in measuring emotion perception use computer generated faces or heads rather than photographs of posed human faces. One method, called reverse correlation, measures participants’ internal model of emotional expressions (i.e., their mental representations of which facial configurations are likely to express instances of emotion) by observing how participants label an avatar head that displays random combinations of animated facial action units (Yu et al., 2012; for a review, see Jack et al., 2018; ). As each pattern appears (on a given test trial), participants infer its emotional meaning by choosing an emotion label from a set of options (a choice-from-array response). After thousands of trials, researchers estimate the statistical relationship between the dynamic patterns of facial movements and each emotion word (e.g., disgust) to reveal participants’ beliefs about which facial configurations are likely to express different emotion categories.

A second approach using computer-generated faces would have participants interact with more fully developed virtual humans (Rickel, Marsella, et al., 2003), also known as Embodied Conversational Agents (Cassell et al., 2000). Virtual humans are software-based artifacts that look like and act like people (for examples, see Figure 7). They are similar to characters in video games in their surface appearance and are designed to interact face-to-face with humans using the same verbal and nonverbal behavior that people use to interact with one another. The underlying technologies used to realize virtual humans vary considerably in approach and capability, but most virtual human models can be programmed to make context-sensitive, dynamic facial actions that, when in a person, would typically communicate emotional information to other people (see Box 11 in SOM for discussion). The majority of the scientific studies with virtual humans were not designed to test whether human participants infer specific emotional meaning in a virtual human’s facial movements, but their design makes them useful for studying when and how facial movements take on meaning as emotional expressions: Unlike all the other ways of assessing emotion perception discussed so far, which ask participants to make explicit inferences about the emotional cause of facial configurations, interactions with virtual humans allow scientists to study how a participant implicitly infers emotional meaning during social interactions.

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (43)

Examples of virtual humans. Virtual humans are software-based artifacts that look like and act like people.

(A) Feng et al, 2017; (B) Zoll et al., 2006; (C) Hoyt et al., 2003; (D) Marsella et al. 2000.

Testing the common view from observations of whether certain facial configurations are reliably and specifically perceived as expressions of certain emotion categories.

Traditionally, in most experiments, if participants reliably infer an emotional state from a facial configuration (e.g., inferring anger from a scowling facial configuration) at levels that are greater than what would be expected by chance, then this is taken as evidence that people recognize an emotional state in its facial display. It is more scientifically correct, however to interpret this as evidence that people infer an emotional state (i.e., they consistently make a reverse inference) unless the inference has been verified as valid (i.e., the person in the photograph is, indeed, in the expected emotional state). Only when reverse inferences are observed in a reliable and specific way within an experiment can scientists reasonably infer that participants are perceiving an instance of a certain emotion category in a certain facial configuration; technically, the inference holds only for emotion perception as it occurs in the particular situations contained in the experiment (because situations are never randomly sampled). If the emotion perception evidence replicates across experiments that sample people from the same culture, then the interpretation can be generalized to emotion perceptions in that culture. Only when the findings generalize across cultures – that is, replicate across experiments that sample people from different cultures -- is it reasonable to conclude that people universally infer a specific emotional state when perceiving as specific facial configuration. These findings might also be interpreted as evidence about the reliability and specificity of producing emotional expressions if the co-evolution assumption is valid (i.e., that emotional expressions and their perception co-evolved as an integrated signaling system; Ekman et al., 1972; Jack et al., 2016; ).

Studies of Healthy Adults From the U.S. and Other Developed Nations

Studies that measure emotion perception with choice-from-array tasks.

The most recent meta-analysis of emotion perception studies was published in 2002 (). It statistically summarized 87 experiments in which over 22,000 participants from over 20 cultures around the world inferred emotional meaning in facial configurations and other stimuli (such as posed vocalizations). The majority of participants were sampled from larger-scale or developed countries, including Argentina, Brazil. Canada, Chile, China, England, Estonia, Ethiopia, France, Germany, Greece, Indonesia, Ireland, Israel, Italy, Japan, Malaysia, Mexico, the Netherlands, Scotland, Singapore, Sweden, Switzerland, Turkey, the US, Zambia and various Caribbean countries. The majority of studies (95%) used posed facial configurations; only four studies had participants label spontaneous facial movements, a dramatic example of the challenges facing validity that we discussed earlier. All but four studies used a choice-from-array response method to measure emotion inferences, a good example of the challenges facing hypothesis disconfirmation that we discussed earlier.

The results of the meta-analysis, presented in Figure 8a, reveal that perceivers inferred emotions in the facial configurations in Figure 4 in line with the common view, well above chance levels (using the criteria set out by , presented in Table 2).. Results provided strong evidence that, when participants are viewing posed facial configurations made by people from their own culture, they reliably perceived the expected emotion in those configurations: scowling facial configurations were perceived as anger expressions, wide-eyed facial configurations were perceived as fear expressions, and so on, for all six emotion categories. Moderate levels of reliability were observed when perceivers were labeling facial configurations posed by people from other cultures; this difference in reliability between same- and cross-culture differences is referred to as an ingroup advantage (see Box 12, in SOM). The majority of emotion perception studies do not report whether the hypothesized facial configurations are perceived with any specificity (e.g., how likely was a scowl to be perceived as expressing an instance of emotion categories other than anger, or as an instance of a mental category that is not considered emotional). Without information about specificity, no firm conclusions can be drawn about the emotional meaning of the facial configurations in Figure 4, especially for the translational purpose of inferring someone’s emotional state from their facial comportment in real life.

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (44)

Emotion perception findings.

(A) Average effect sizes for perceptions of facial configurations from , in which 95% of the articles summarized used choice-from-array to measure participants’ emotion inferences. (B) Free-labeling of facial configurations across five language groups from . IDs chosen represent the best match to the commonsense facial configurations in Figure 4 based on AUs present. No configuration discovered in this study exactly match the AU configurations proposed by Darwin or documented in prior research. Proportion of times participants offered emotion category labels (or their synonyms) are reported. According to standard scientific criteria, universal expressions of emotion should elicit agreement rates that are considerably higher than those reported here, generally in the 70 ± 90% range, even when methodological constraints are relaxed (). Specificity data were not available for the meta-analysis.

Nonetheless, most of the studies cited in the meta-analysis interpret their reliability findings alone as evidence for the reverse inference of inferring anger from a scowling face, disgust from a nose-wrinkled face, fear from a wide-eyed gasping face, and so on. Such findings may explain why many scientists who study emotion, when surveyed, indicated that they believe compelling evidence exists for the hypothesis that certain emotion categories are each expressed with a unique, universal facial configuration (see Ekman, 2016) and interpret variation in emotional expressions to be caused by cultural learning that modifies what are presumed to be inborn universal expressive patterns (e.g., Cordaro et al., 2017; Ekman, 1972; Elfenbein, 2013). Cultural learning has also been hypothesized to modify how people “decode” facial configurations during emotion perception (Buck, 1984).

Studies that measure emotion perception with free-labeling tasks.

As we foreshadowed, experimental methods that place fewer constraints on participants’ inferences in experiments that measure emotion perception (Table 5) provide considerably less support for the common view of emotional expressions. In the least constrained experimental task, called free-labeling, perceivers freely volunteer a word (emotion or otherwise) that they believe best captures the meaning in a facial configuration rather than choosing from a small set of experimenter-chosen options. In urban samples, participants who freely-label facial configurations produce the expected emotion labels with weak reliability (when labeling spontaneously produced facial configurations) to moderate reliability (when labeling posed facial configurations), and usually reveal weak specificity when it is assessed at all (for examples and discussion, see Russell, 1994; also see ). For example, when participants from many countries where English, Spanish, Mandarin Chinese, Farsi, Arabic and Russian is spoken as a first language were then asked to freely provide emotion words to label each of 35 facial configurations that had been cross-culturally identified (), their labels provided evidence of a moderately reliable correspondence between facial configurations and emotion categories, but there was no evidence of specificity (see Figure 8b).32 Multiple facial configurations were associated with the same emotion category label (e.g., 17 different facial configurations were associated with the expression of happiness, five with anger, four with sadness, four with surprise, two with fear, and one with disgust). This many-to-many mapping is inconsistent with the common view that the facial configurations in Figure 4 are universally recognized as expressing the hypothesized emotion category, and they give evidence of variation that is far beyond what is proposed by the basic emotion view. Some of this variability may come from variability across different cultures and languages, but there is variability even within a single culture and language. Evidence of this many-to-many mapping is apparent in free-labeling tasks in small-scale, remote samples as well (Gendron et al., 2018), which we discuss in the next section.

Studies that measure emotion perception with the reverse correlation method.

Using a choice-from-array response method with the reverse correlation method is an inductive way to learn people’s beliefs about which facial configurations express an emotion category (for a review, see Jack et al., 2018; ). In such studies, participants view thousands of random combinations of AUs that are computer generated on an avatar head and label each one by choosing an emotion word from a set of pre-defined options. All of the facial configurations labeled with the same emotion word (e.g., anger) are then statistically combined for each participant to estimate a belief about which facial movements express the corresponding emotion category. One recent study using the reverse correlation method with U.K. and Chinese participants found evidence of both variation in the facial movements that were judged to express a single emotion category, as well as similarity in the facial movements that were judged to express different categories (Jack et al., 2016). The study first identified groupings of emotion words that are widely discussed in the scientific literature (which, we should note, is dominated by English), corresponding to 30 English words grouped into eight emotion categories for the U.K. sample (happy/excited/love, pride, surprise, fear, contempt/disgust, anger, sad and shame/embarrassed) and 52 Chinese words grouped into twelve categories in the Chinese sample (joyful/excitement, pleasant surprise, great surprise/amazement, shock/alarm, fear, disgust, anger, sad, embarrassment, shame, pride, and despise). The reverse correlation method revealed 62 separate facial configurations: the same emotion category in a given culture was associated with multiple models of facial movements because synonyms of the same emotion category were associated with distinctive models of facial movements. Amidst this variability, Jack and colleagues also found that these 62 separate facial configurations could be summarized as four prototypes which are presented in Table 6, along with the corresponding emotion words that they were frequently associated with. Each prototype was described with a unique set of affective features (combinations of valence, arousal and dominance). When the four estimated configurations are compared with the common view presented in Figure 4, along with the basic emotion hypotheses listed in Table 1, there are some striking similarities: Configuration 1 most closely resembles the proposed expression for happiness, configuration 2 is similar to a combination of the proposed expressions for fear and anger, configuration 3 most closely resembles the proposed expression for surprise, and configuration 4 is similar to a combination of the proposed expressions for disgust and anger. 33 Taken together, these findings suggests that, at the most general level of description, participants’ beliefs about emotional expressions (i.e., their internal models of which facial movements expressed which emotions) were consistent with the common view (indeed, they could be taken to constitute part of the common view), but when examined in finer detail with more granularity, participants’ also believe that there is substantial within-category variation in the facial movements that express instances of the same emotion category. This finding suggests that the way the common view is often described in reviews, depicted in the media, and used in many applications, does not in fact do justice to people’s more detailed beliefs about variability in facial expressions.

Studies that implicitly assess emotion perception during interactions with virtual humans.

Designers typically study how a virtual human’s expressive movements influence an interaction with a human participant. Much of the early research modeling expressive movements in virtual humans focused on endowing them with the facial expressions proposed in Figure 4. A number of studies have endowed virtual humans with blends of these configurations (Bui et al., 2004; Arya et al. 2009). Designers are also inspired by other people’s beliefs about how emotions are expressed. Actors, for example, have been asked to pose facial configurations that they believe express emotions, which are then processed by graphical and machine learning algorithms to craft the relation between emotional states and expressive movements (Alexander et al, 2009). In another study, human subjects used a specially designed software tool to craft animations of facial movements that they believed express certain mental categories, including emotion categories. Then, other human subjects judged the crafted facial configurations (Ochs et al., 2010). Increasingly, data-driven methods are used that place people in emotion-eliciting conditions, capture the facial and body motion and then synthesize animations from those captured motions (Niewadomski et al., 2015; Ding et al., 2014, ).

In general, studies with virtual humans nicely show how the situational context influences how people infer the meaning of facial movements (de Melo et al., 2014). For example, in a game that allowed competition and cooperation (Prisoner’s Dilemma, ), a virtual human who smiled after making a competitive move evoked more competitive, less cooperative responses from human participants compared to a virtual human using an identical strategy in the game (tit-for-tat) but that smiled after cooperating. Virtual humans who make a verbal comment about a film that is inconsistent with their facial movements, such as saying they enjoyed the film but grimacing that was quickly followed by a smile, were perceived as less reliable, trustworthy and credible ().

The dynamics of the facial actions, including the relative timing, speed and duration of the individual facial actions, as well as the sequence of facial muscle movements over time, offer information over and above the mere presence or absence of the movements themselves and have an important influence on how human perceivers interpret facial movements (e.g., Ambadar et al., 2009; Keltner 1995; , Krumhuber et al. 2013) and how much they trust a virtual human during a social interaction (Krumhuber et al., 2009). Research with virtual humans has shown that the dynamics of facial muscle movements are critical for them to be perceived as emotional expressions (Niewiadomski et al, 2015; Ochs et al., 2010). These findings are consistent with research showing that the temporal dynamics carry information about the emotional meaning of facial movements that are made by real humans (e.g., Kamachi et al., 2001; ; ; for a review, see ).34


Whether or not people can reliably perceive emotions in the expressive configurations of Figure 4, as predicted by the common view, depends on how participants are asked to report or register their inferences (see Table 4). Hundreds of experiments have asked participants to infer the emotional meaning of posed, exaggerated facial configurations like those presented in Figure 4 by choosing a single emotion word from a small number of options offered by scientists, called choice-from-array-tasks. This experimental approach tends to generate moderate to strong evidence that people reliably label scowling facial configurations as angry, frowning facial configurations as sad, and so on for all six emotion categories that anchor the common view. Choice-from-array tasks severely limit the possibility of observing evidence that can disconfirm the common view of emotional expressions, however, because they restrict participants’ options for inferring the psychological meaning of facial configurations by offering them a limited set of emotion labels. (As we discuss below, when people are provided with labels other than angry, sad, afraid, as so on, they routinely choose them; e.g., ; also see Crivelli et al., 2017). Additionally, the specificity of those judgments is largely unreported. Scientists often go further and interpret the reliability findings from these studies as evidence that scowls are expressions of anger, frowns are expressions of sadness, and so on. This logic is not sound, however, because most of these studies ask participants to infer emotion in posed, static faces which are likely limited in their validity (i.e., people posing facial configurations like those depicted in Figure 4 are unlikely to be in the hypothesized emotional state). Furthermore, other ways of assessing emotion perception, such as the reverse correlation method and free-labeling tasks, find much weaker evidence for reliability and/or specificity of emotion inferences. Instead, they suggest that what people actually infer and believe about facial movements incorporates considerable variability: In short, the common view depicted in many reviews, summaries, the media, and used in numerous applications is not an accurate reflection of what people in fact believe about facial expressions of emotion, when probed in more detail. In the next section, we discuss scientific evidence from studies of emotion perception in small-scale remote cultures, which further undermines the common view.

Studies of Healthy Adults Living in Small-Scale, Remote Cultures

A growing number of studies examine emotion perception in people from remote, non-industrialized groups. A more in-depth review of these studies can be found in Gendron et al. (2018). Our goal here is to summarize the trends found in this line of research (see Table 7).

Table 7:

Summary of cross-cultural emotion perception in small-scale societies

UnsupportedWeak SupportModerate SupportStrong Support
Free-labelingFore, PNGa100Sorenson (1975), Sample 2 cSadong, Borneo a115Sorenson (1975), Sample 4 b
Bahinemo, PNG71Sorenson (1975), Sample 3
Hadza, Tanzania43Gendron et al. (2018), Study 1
Trobrianders, PNG32fCrivelli et al. (2017), Study 1
Cue-to-cue matchingShuar, Ecuador23, Study 2
Choice-from array: Matching face and wordsFore, PNGa32Ekman et al. (1969)c
Mwani, Mozambique36efCrivelli, Jarillo et al. (2016), Study 2Sadong, Borneo a115Ekman et al. (1969)b
Trobrianders, PNG24fCrivelli et al. (2017), Study 2Dioula, Burkina39, Study 2
Trobrianders, PNG68 efCrivelli, Jarillo et al. (2016), Study 1
Trobrianders, PNG36 fCrivelli, Russell et al. (2016), Study 1a
Choice-from array: Matching face and scenarioHadza, Tanzania54Gendron et al. (2018), Study 2Dani, New Guineaa34Described in Ekman, 1972g
Fore, PNG a1189, 130 ed
Fore, PNG a1189, 130 eSorenson (1975), Sample 1 d

Notes. Findings summarized for anger, disgust, fear, sadness and surprise; happiness is the only pleasant category tested in all studies but and therefore perception can be (and likely is) guided by distinguishing valence in those studies. All studies used photographs of posed facial configurations that are similar to those in Figure 4, except Crivelli, Jarillo et al. (2016), Study 2 and Crivelli et al. (2017), Study 1. The study was designed to examine emotion perception from vocalizations but is included because perceivers matched them to faces; in addition, participants were tested in a second language (Spanish) in which they received training. All choice-from-array studies did not carefully control whether foils and target facial configurations could be distinguished by valence and/or arousal except Gendron et al. 2018, Study 2. N = sample size. All participants were adults unless otherwise specified as adol=adolescents, ch=children. PNG = Papua New Guinea. Unsupported = reliability and specificity at chance, or any level of reliability above chance combined with evidence of no specificity. Weak support = reliability between 20% and 40% (weak) for at least a single emotion category other than happiness combined above chance specificity for that category or reliability between 41% and 70% (moderate) for at least a single category other than happiness with unknown specificity. Moderate support = reliability between 41% and 70% (moderate) combined with any evidence of above chance specificity those categories or reliability above 70% (strong) for at least a single category other than happiness with unknown specificity. Strong support = strong evidence of reliability (above 70%) and strong evidence of specificity for at least a single emotion category other than happiness.

Superscript a:Specificity levels were not reported.

Superscript a1:Specificity inferred from reported results.

Superscript b:The sample size, marginal means and exact pattern of errors reported for the Sadong samples is identical in Sorenson (1975), Sample 3 and Ekman et al. (1969); Sorenson described using a free-labeling method and Ekman et al. (1969) described using a choice-from-array method in which participants were shown photographs and asked to choose a label from a small list of emotion words; Ekman (1994) indicated, however, that he did not use a free-labeling method, implying that the samples are distinct.

Superscript c:Sorenson (1975), Sample 2 included three groups of Fore participants (those with little, moderate and most other group contact). The pattern of findings is nearly identical for the subgroup with the most contact and the data reported for the Fore in Ekman et al. (1969); again, Sorenson described using a free-labeling method and Ekman et al. (1969) described using a choice-from-array method. It is questionable whether the Sadong and the Fore subgroup should be considered isolated (see Sorenson, 1975, p. 362 and 363), but we include them here to avoid falsely dichotomizing cultures as “isolated from” versus “exposed to” one another (Fridlund, 1994; Gewald, 2010).

Superscript d:these are likely the same sample because the sample sizes and pattern of data are identical for all emotion categories except for the fear category, which is extremely similar, and for the disgust category which includes responses for contempt in but was kept separate in Sorenson (1975).

Superscript e:participants were children.

Superscript f:participants were adolescents.

Superscript g:The Dani sample reported in Ekman, 1972 is likely a subset of the data from Ekman, Heider, Friesen, and Heider (unpublished manuscript).

Studies that measure emotion perception with choice-from-array tasks.

During the period from 1969 to 1975, somewhere between five and eight small-scale samples from remote cultures in the South Pacific were studied with choice-from-array tasks to investigate whether participants perceived emotional expression in facial movements in a similar way when compared to people from the US and other industrialized countries of the Western world (see Figure 9a). Our uncertainty in the number of samples stems from reporting inconsistencies in the published record (see note to Table 7). We present the findings here according to how the original authors reported their findings, despite the inconsistencies. Five samples performed choice-from-array tasks, three in which participants chose a photographed facial configuration to match one brief vignette that described each emotion category (Ekman, 1972; ; Sorenson, 1975) and two in which they chose a photograph to match an emotion word (). All five samples performing some version of a choice-from-array task provided strong evidence in support of cross-cultural reliability of emotion perception in small-scale societies. Evidence for specificity was not reported. Until 2008, all claims that anger, sadness, fear, disgust, happiness and surprise are universally recognized (and therefore are universally expressed) were largely based on three papers (two of them peer reviewed) reporting on four samples (Ekman, 1972; ; Ekman et al., 1969).35

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (45)
Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements (46)

Map of cross-cultural studies of emotion perception in small-scale societies.

People in small scale societies typically live in groupings of several hundred to several thousand that maintain autonomy in social, political and economic spheres. (A). Epoch 1 studies, published between 1969 and 1975, were geographically constrained to societies in the South Pacific. (B). Epoch 2 studies, published between 2008 and 2017, sample from a broader geographic range including Africa and South America, and are more diverse in the ecological and social contexts of the societies tested. This type of diversity is a necessary condition for discovering the extent of cultural variation in psychological phenomena (Medin et al., 2017). Reproduced with permission from Gendron et al. (2018).

Since 2008, 10 verifiably separate experiments observing emotional inferences in small-scale societies have been published or submitted for publication. These studies include a greater diversity of social and ecological contexts, including sampling five small-scale societies across Africa and the South Pacific (see Figure 9b) who were tested with a greater diversity of research methods listed in Table 5, including tasks that allow for the possibility of observing cross-cultural variation in emotion perception and therefore the possibility of disconfirming the common view. Six samples registered their emotion inferences using a choice-from-array task, in which participants were given an emotion word and asked to choose the posed facial configuration that best matched it or vice versa (Crivelli, Jarillo et al., 2016; Crivelli, Russell et al., 2016; Crivelli et al., 2017, Study 2; Gendron et al., 2018, Study 2; ). Only one study () reported that participants selected an emotion word to match the facial configurations similar to those in Figure 4 more reliably than what would be expected by chance, and effects ranged from weak (anger and fear) to strong (happiness) with surprise and disgust falling in the moderate range.36 Information about the specificity of emotion inferences was not reported. A close examination of the evidence from four studies by Crivelli and colleagues suggest weak to moderate levels of reliability for inferring happiness in smiling facial configurations (all four studies), sadness in frowning facial configurations (all four studies), fear in gasping, wide-eyed facial configurations (three studies), anger in scowling facial configurations (two studies) and disgust in nose-wrinkled facial configurations (three studies). A detailed breakdown of findings can be found in Box 13, in SOM. None of the studies found specificity for any facial configuration, however, except that smiling was reported as unique to happiness, but that finding did not replicate across samples.37

The final study using a choice-from-array task with people from a small-scale, remote culture is important because it involves the Hadza hunter-gatherers of Tanzania (Gendron et al., 2018, Study 2). 38 The Hadza are a high-value sample for two reasons. First, universal and innate emotional expressions are hypothesized to have evolved to solve the recurring fitness challenges of hunting and gathering in small groups on the African savanna (Pinker, 1997; ; ); the Hadza offer a rare opportunity to study foragers who are currently living in an ecosystem that is thought to be similar to that of our Paleolithic ancestors.39 Second, the population is rapidly disappearing (http://www.sciencemag.org/news/2018/05/farmers-tourists-and-cattle-threaten-wipe-out-some-world-s-last-hunter-gatherers). Prior to this study, the Hadza had not participated in any studies of emotion perception, although they have been the subject of social cognition research more broadly (H. C. Barrett et al., 2016; Bryant et al., 2016). After listening to a brief story about a typical instance of anger, disgust, fear, happiness, sadness and surprise, Hadza participants chose the expected facial configuration more often than chance if the target and foil were distinguished by the affective property referred to as valence (i.e., a smiling configuration depicting a pleasant state vs. a scowling configuration depicting an unpleasant state, consistent with anthropological studies of emotion (Russell, 1991), linguistic studies () and findings from other recent studies of participants from small-scale societies, such as the Himba (Gendron et al., 2014a, b) and the Trobriand Islanders (Crivelli, Jarillo et al., 2014). (Also see , described in Box 7, who showed that perceivers can reliably infer valence but not arousal in facial configurations). In addition, Hadza participants who had some contact with people from other cultures -- they had some formal schooling or could speak Swahili which is not their native language – were more consistently able to choose the common facial configuration than were those with no formal schooling who spoke minimal Swahili (for a similar finding with Fore participants in a free labeling study, see Table 2 in Sorenson, 1975). Of the 27 Hadza participants who had minimal contact with other cultures, only 12 reliably chose the wide-eyed gasping facial configuration to match the fear story at above chance levels. (Compare this finding to the observation that the hypothesized universal expression for fear – a wide-eyed gasping facial configuration – is understood as an aggressive, threatening display by Trobriand Islanders; , 2017; Crivelli, Russell et al., 2016).

Studies that measure emotion perception with free-labeling tasks.

During the period from 1969 to 1975, between one and three small-scale samples from remote cultures in the South Pacific were studied with free-labeling to investigate emotion perception (reported in Sorenson, 1975; see Table 7). From 2008 onward, two additional studies were conducted, one using spontaneous facial configurations (Crivelli et al., 2017, Study 1) and the other using posed facial configurations (Gendron et al., 2018, Study 2). Overall, all five studies provide no evidence that the facial configurations in Figure 4 evolved to specifically express certain emotion categories. The three free-labeling studies reported in Sorenson (1975) produced variable results. The only replicable finding appears to be that participants labeled smiling facial configurations uniquely as happiness in all studies (as the only pleasant emotion category tested). The two newer free-labeling studies both indicated that participants only rarely spontaneously labeled facial configurations with the expected emotion labels (or their synonyms) above chance levels. Trobriand Islanders did not label the proposed facial configurations for happiness, sadness, anger, surprise or disgust with the expected emotion labels (or their synonyms) at above chance levels (although they did label the faces consistently with other words; Crivelli et al., 2017, Study 1). Hadza participants labeled smiling and scowling facial configurations at above chance levels as happiness (44%) and anger (65%), respectively (Gendron et al., 2018, Study 2). The word “anger” was not used to uniquely label scowling facial configurations, however, and was frequently applied to frowning, nose-wrinkled and gasping facial configurations.

Facial movements carry meaningful information, even if they do not reliably and specifically display internal emotional states.

The more recent studies of people living in small-scale, remote cultures suggest two interesting observations that are worthy of note. First, even though people may not routinely infer anger from scowls, sadness from frowns, and so on, they do reliably infer other social meanings for those facial configurations, because facial movements often carry important information about a person’s inner state, such as their social motives (Crivelli et al., 2016, 2017; Rychlowska et al., 2015; Wood et al., 2016; ; for a discussion, see Fridlund, 2017; Martin et al., 2017). For example, as we mentioned earlier, Trobriand Islanders consistently labeled wide-eyed gasping faces (the proposed expressive facial configuration for the fear category) as signaling an intent to attack (i.e., a threat; for additional evidence in carvings and masks in a variety of cultures, including Maori, !Kung Bushmen, Himba, Eipo, see , 2017).

Second, people do not always infer internal psychological states (emotions or otherwise) from facial movements. People who live in non-western cultural contexts, including Himba and Hadza participants, are more likely to assume that other people’s minds are not accessible to them, a phenomenon called opacity of mind in anthropology (Danziger, 2006; ). Instead, facial movements are perceived as actions that predict future actions in certain situations (e.g., a wide-eyed gasping face is labeled as “looking” (Crivelli et al., 2017; Gendron et al., 2014a; Gendron et al. 2018). Similar observations were unavailable for the earlier studies conducted by Ekman, Friesen and Sorenson because, according to Sorenson (1975), they directed participants to provide emotion terms. When participants spontaneously offered an action label (e.g. “she is just looking”) or a social evaluation (e.g., “he is ugly”, or “he is stupid”), they were asked to provide an “affect term.” Findings like these suggest that there may be profound cultural variation in the type of inferences human perceivers make when looking at other human faces in general, an observation that has been raised by a number of anthropologists and historians.

A note on interpreting the data.

To properly interpret the scientific evidence, it’s crucial to consider the constraints placed on participants by the experimental tasks they are asked to complete, summarized in Table 5. In most urban and in some remote samples, experiments using choice-from-array tasks produce evidence supporting the common view: Participants reliably label scowling facial configurations as angry, smiling facial configurations as happy, and so on. (We don't yet know whether perceivers are uniquely labeling each facial configuration as a specific emotion because most studies don’t report that information.) It has been known for almost a century that choice-from-array tasks help participants obtain a level of reliability in their emotion perceptions that are not routinely seen in studies using methods that allow participants to respond more freely, and this is one reason they were chosen for use in the first place (for a discussion, see , 2017; Russell, 1994; ). When participants are offered words for happiness, fear, surprise, anger, sadness, and disgust to register their inferences for a scowling facial configuration, they are prevented from judging a face as expressing other emotion categories (such as confusion or embarrassment), non-emotional mental states (e.g., a social motive, such as rejection or avoidance), or physical events (such as pain, illness or gas), thus inflating reliability rates within the task. When people are provided with other options, they routinely choose them. For example, participants label scowling faces as “determined” or “puzzled,” wide-eyed faces as “hopeful” and gasping faces as “pained” when they are provided with stories about those emotions rather than with stories of anger, surprise and fear (; also see Crivelli et al., 2017). The problem is not with the choice-from-array task per se – it is more with failing to consider alternative explanations for the observations in an experiment and therefore drawing unwarranted conclusions from the data.

Choice-from-array tasks may do more than just limit response options, making it difficult to disconfirm commonsense beliefs. The emotion words provided during the task may actually encourage people to see anger in scowls, sadness in pouts, and so on, or to learn associations between a word (such as “anger”) and a facial configuration (such as a scowl) during the experiment (e.g., Gendron et al., 2015; ). The potency of words is discussed in Box 14, in SOM.


The pattern of findings from the studies conducted with remote samples replicates and underscores the pattern observed in samples of participants from larger, more urban cultural contexts: Asking perceivers to infer an emotion by matching a facial configuration to an emotion word selected from a small array of options, or telling participants a brief story about a typical instance of an emotion category and asking them to pick a facial configuration from an array of two or three photos, generally inflates agreement rates, producing evidence that is more likely to support the hypothesis of reliable emotion perception when compared to data coming from less constrained response methods such as free labeling (see Table 4). This is particularly true for studies that include only one pleasant emotion category, i.e., happiness, where all foils differ from the target in valence, and therefore the robust reliability and specificity for inferring happiness from smiling in these studies may be the result of participants engaging in valence perception rather than emotion perception, per se. Studies that use less constrained tasks that are designed to more freely discover how people perceive emotion instead yield evidence that generally fails to find support for the common view. Less constrained studies suggest that perceivers infer more than one emotion category from the same facial configuration, infer the same emotion category in a variety of different configurations and often disagree about the set of emotion categories that they infer. Cultural variation in emotion perception is consistent with the variation we observed in the first section of this paper when we reviewed studies of emotional expression production (again, see Table 4), and is even consistent with the basic of face perception, which itself is determined by experience and cultural factors (Caldara, 2016).

Studies of Healthy Infants and Children

Some scientists concur with the common view that infants can read specific instances of emotion in faces from birth (; ; ; Walker-Andrews, 2005). However, it is difficult to ascertain whether infants and young children possess the various capacities required to perceive emotion per se: simply detecting and discriminating facial movements is not the same as categorizing them to infer their emotional meaning. This is because it is challenging to design well-controlled experiments that do a good job of distinguishing these two capacities. Infants are preverbal, so scientists use other measurement techniques, such as the amount of time an infant looks at a stimulus, to infer whether infants can discriminate one facial configuration from another, and ultimately, whether infants categorize those configurations as emotionally meaningful (for a brief explanation, see Box 15, in SOM). This approach introduces several possible confounds because of the stimuli used in the experiments: infants and children are typically shown photographs of the proposed expressive forms that are similar to those presented in Figure 4 (e.g., Leppanen et al, 2009; Peltola et al., 2008). Infants are more familiar with some of these configuration than with others (e.g., most infants are more familiar with smiling faces than with scowls or frowns) and familiarity is known to influence perception (see Box 15, in SOM), making it difficult to know which features of a face are holding an infant’s attention (familiarity or novelty) and which might be the basis of categorization in terms of emotional meaning. The configurations proposed for each emotion category also differ in their perceptual features (e.g., the proposed expressions for fear and surprise contain widened eyes whereas the proposed expression for sadness does not), contributing more ambiguity to the interpretation of findings. For example, when an infant discriminates smiling and scowling facial configurations, it is tempting to infer that that the child is discriminating expressions of anger and happiness when in fact that target of discrimination is the presence or absence of teeth in a photograph (). Moreover, the facial configurations in question are usually made from exaggerated facial movements that are not typical of the expressive variation that children actually observe in their everyday lives (Grossman, 2010). Furthermore, unlike adults, infants may have had little or no experience with viewing photographs of anything, including heads of people with no bodies and no context.

The most important and pervasive confound in developmental studies of emotion perception is that most studies are not designed to test whether infants and children discriminate facial configurations according to their emotional meaning or whether they are discriminating affective features (pleasant vs. unpleasant; high arousal vs. low arousal) (see Box 9, SOM). Often, a facial configuration that is intended to depict a pleasant instance of emotion (smiling in happiness) is compared to one that is intended to depict an unpleasant instance of emotion (e.g., scowling in anger, frowning in sadness or gasping in fear), or these configurations are compared to a neutral face at rest (e.g., ; ; ). (This problem is similar to the one encountered earlier in our discussion of emotion perception studies in adults from small scale societies, in which perceptions of valence can be confused with perceptions of emotion categories). For example, in one study, 16-18 month olds preferred toys paired with smiling faces and avoided toys paired with scowling and gasping faces (Martin et al. 2014); this type of study cannot distinguish between whether infants are differentiating pleasant from unpleasant, approach vs. avoidance, or something about a specific emotion. Another study () reported that seven-month-olds distinguish sadness and anger when looking at faces, but only when the faces were paired with vocalizations. What is unclear is the extent to which the level of arousal or activation conveyed in the acoustic signals were most salient to infants. A recent study suggested that 10-month-old infants can differentiate between the high arousal, unpleasant scowling and nose-wrinkled facial configurations that are proposed as expressions of anger and disgust, suggesting that they can categorize these two facial configurations separately (Ruba et al., 2017). Yet, the scowling and nose-wrinkled facial configurations also differed in the properties besides their proposed emotional meaning: scowling faces showed no teeth, but nose-wrinkled faces were toothy, and it is well known that infants use perceptual features such as “toothiness” to categorize faces (see Caron et al., 1985). If an infant looks longer at a (pleasant) smiling facial configuration after viewing several (unpleasant) scowling faces, this does not necessarily mean that the infant has discriminated and understands “happiness” from “anger”; the infant might have discriminated positive from negative, affective from neutral, familiar from novel, the presence of teeth from the absence, less eye sclera from more, or even different amounts of contrast in the photographs. In the future, experiments must be designed to rule out the possibility that infants are categorizing facial configurations into different groupings based on factors other than emotion to provide a sound basis to infer that infants are processing specific emotional meaning.

As a consequence of these confounds, there is still much to learn about the developmental course of emotion perception abilities. By three months of age, infants can distinguish the facial features (the morphology) in the proposed expressive configurations for happiness, surprise, and anger, and, by seven months, they can discriminate the features in proposed expressive configurations for fear, sadness, and interest. Left uncertain is whether, beyond just discriminating between the mere appearance of particular facial features, infants also understand the emotional meaning that is typically inferred from those features. By seven months of age, infants can reliably infer whether someone is feeling pleasant or unpleasant when facial configurations are accompanied by sensory information from the voice (; ). Only a handful of studies have attempted to test whether infants can infer emotional meaning in facial configurations rather than just discriminating between faces with different physical appearances, but they report conflicting results (Schwartz et al., 1985; Serrano et al., 1992). One promising future direction involves measuring the electrical signals (event related potentials, or ERPs) in infant brains as they view the proposed expressive configurations for anger and fear categories (e.g., Kobiella et al., 2008; ). Both of these studies reported differential brain responses to the proposed facial configurations for anger and fear, but their findings did not replicate one another (and for certain measurements, they observed opposing effects; for a broader review, see Grossmann, 2015).

Studies that measure a child’s ability to use an adult caregiver’s facial movements to resolve ambiguous or threatening situations, referred to as social referencing, have been interpreted as evidence of emotion perception in infants. One-year-olds use social referencing to stay in close physical proximity to a caregiver who is expressing negative affect, while infants are more likely to approach novel objects if the caregiver expresses positive affect (; Moses et al., 2001; Saarni et al., 2006). Similar results emerge from the caregiver’s tone of voice (; Mumme, Fernald, Herrera, 1996). In fact, by 14 months of age, the positive or negative tone of a caregiver’s voice influence what an infant will touch even more so than will a caregiver’s facial movements or the content of what the adult is actually saying (; ). These studies clearly suggest that infants can infer the valenced meaning of facial movements, at least when made by live (as opposed to virtual) people who they are familiar with. But, again, these data do not help resolve what, if anything, infants infer about the emotional meaning of facial movements.

Learning to perceive emotions.

Children grow in emotionally rich social environments, making it difficult to run experiments that are capable of testing the common view of emotion perception while also taking into account the possible roles for learning and social experience. Nonetheless, several themes have emerged in the scientific literature, all of which suggest a clear role for learning and context in children’s developing emotion perception capacities.

One hypothesis that continues to be strongly supported by experiments is that children’s capacity to infer emotional meaning in facial movements depends on context (the conditions surrounding the face that may convey information about a face’s meaning). For example, emotion concept learning, as a potent source of internal context, shapes emotion perception capacity (discussed in Boxes 10 and 16 in SOM). There are also developmental changes in how people use context to shape their emotional inferences about facial movements. Children as young as 19 months old can detect facial movements that are emotionally incongruent with a context (). For example, when presented with adult facial configurations that are placed on bodies posing an emotional context (e.g., a scowling facial configuration placed on a body holding a soiled diaper), children (aged four, eight, and twelve) moved their eyes back and forth between faces and bodies when deciding how to label the emotional meaning of the faces, whereas adult participants directed their gaze (and overt visual attention) to the face alone, judging its emotional meaning in a way that was independent of the bodily context (). The youngest children were equally likely to label the scene based on face or context. The results of this experiment suggest that younger children devote greater attention to contextual information and actively cross-reference facial and contextual cues, presumably to better learn about and understand the emotional meaning those cues.40

Another important source of context that shapes the development of emotion perception in children involves the broader environment in which children grow. Children who grow up in neglectful or abusive environments, where their emotional interactions with caregivers are highly atypical, have a different developmental trajectory than do those growing in more consistently nurturing environments (; Pollak, 2015). Parents from these high-risk families produce unclear or context-inconsistent expressions of emotion (Shackman et al., 2010). Neglected children (who do not receive sufficient social feedback) show delays in perceiving emotions in the ways that adults do (Camras et al., 2006; Pollak et al., 2000), whereas children who are physically abused learn to preferentially attend to and identify facial movements that are associated with threat, such as a scowling facial configuration (Briggs-Gowan et al., 2015; ; ; ; ; ). Abused children require less perceptual information to infer anger in a scowling configuration () and more reliably track the trajectory of facial muscle activations that signal threat (). Children raised in physically abusive environments also more readily infer anger and threat in ambiguous facial configurations () and then require more effortful control to disengage their attention from signs of threat () when compared to children who have not been maltreated. This close attention to scowling faces with knitted eyebrows shapes how abused children understand what facial movements mean. For example, one study found that five-year-old abused children tended to believe that almost any kind of interpersonal situation could result in an adult becoming angry; by contrast, most non-abused children understand that anger is likely in particular interpersonal circ*mstances (Perlman et al., 2008).

By three years of age, North American children not only start to show reliability in their emotion perceptions but they also begin to show evidence of specificity. They understand that facial movements do not necessarily map on to emotional states, and how someone really feels can be faked or masked. Moreover, they know what facial movements are expected in a particular context and try to produce them despite their feelings. For example, the “disappointing gift” experiments developed by psychologist Pamela Cole and her colleagues demonstrate this well. In one study, preschool-aged children were told they would be rewarded with a gift after they completed a task. Later, children received a beautifully wrapped package that contained a disappointing item, such as a broken pair of cheap sunglasses. When facing a smiling unfamiliar adult who has presented them with a gift, children forced themselves to smile (lip corner pull, cheek raise, and brow raise) and to thank the experimenter. Yet, while the children were smiling, they often kept their eyes focused, down, slumped their shoulders, and made negative statements about the object, indicating that they did not, in fact, feel positive about the situation (Cole, 1986). Moreover, there was no difference in the behavioral responses of visually impaired children when receiving a disappointing gift (). Studies like this one provide a more implicit way of assessing children’s knowledge about emotion perception (i.e., it illustrates the inferences that children expect others to make from their own facial movements).


There is currently no clear evidence to support the hypothesis that infants and young children reliability and specifically infer emotion in the proposed expressive configurations for anger, disgust, fear, happiness, sadness and surprise categories (presented in Figure 4; findings summarized in Table 4). A more plausible interpretation of the existing evidence is that young infants infer affective meaning such as valence and arousal in facial configurations. Data from infants and young children obtained using a variety of methods further suggests that emotion perception abilities emerge and are shaped through learning in a social environment. These findings are consistent with evidence that the human face may be evolutionarily privileged to communicate importance or salience. But it is not clear that the expressive configurations proposed for specific emotion categories are similarly privileged in this way.

Summary of Scientific Evidence on the Perception of Emotion in Faces

The scientific findings on perception studies generally replicate those from production studies in failing to strongly support the common view. The one exception to this overall pattern of findings is seen in studies that ask participants match a posed face to an emotion word or scenario. This method produces evidence to support the common view, even when it is applied to completely novel emotion categories with made up expressive cues, opening up interesting questions about the psychological potency of the elements that make up choice-from-array designs (such as the emotion words embedded in the task or the choice of foils on a given trial). These findings reinforce our earlier conclusion that terms like “facial configuration” or “pattern of facial movements” or even “facial actions” are preferred to more loaded terms like “emotional facial expression,” “emotional expression” or “emotional display,” which can be, at best misleading, and at worst, incorrect.

Summary and Recommendations

Evaluation of the Empirical Evidence

The common view that humans around the world reliably produce and recognize certain emotions in specific configurations of facial movements continues to echo within the science of emotion, even as scientists increasingly acknowledge that anger, sadness, happiness and other emotion categories are more variable in their facial expressions. This entrenched common view does more than guide the practice of science. It influences public understanding of emotion, and hence education, clinical practice, and applications in industry. Indeed, it reaches into almost every facet of modern life, including emoticons and movies. Nonetheless, there is insufficient evidence to support it. People do express instances of anger, disgust, fear, happiness, sadness and surprise with the hypothesized facial configurations presented in Figure 4 at above chance levels, suggesting that those facial configurations sometimes serve as expressions of emotion as proposed. However, the reliability of this finding is weak, and there is evidence that the strength of support for the common view varies systematically with the research methods used. The strongest support for the common view -- found in data from urban, industrialized or developed samples completing choice-from-array tasks -- does not show robust generalizability. Evidence for specificity is lacking in almost all research domains. A summary of the scientific evidence is presented in Table 4.

The research findings do not imply that people move their faces randomly or that the configurations in Figure 4 have no psychological meaning. Instead, they reveal that the facial configurations in question are not “fingerprints” or diagnostic displays that reliability and specifically signal particular emotional states regardless of context, person and culture. It is not possible to confidently infer happiness from just a smile, anger from a scowl, or sadness from a frown, as numerous technology tries to do when applying what they mistakenly believe to be the scientific facts.

Instead, the available evidence from different populations and research domains – infants and children, adults living in industrialized countries and in remote cultures, and even individuals who are congenitally blind -- overwhelmingly points to a different conclusion: when facial movements do express emotional states, they are considerably more variable and dependent on context than the common beliefs allows. There appear to be many-to-many mappings between facial configurations and emotion categories (e.g., anger is expressed with a broader range of facial movements than just a scowl and scowls express more than anger). A scowling facial configuration may be an expression of anger in the sense of being a part of anger in a given instance. But a scowling facial configuration is not the expression of anger in any generalizable or universal way. Scowling facial configurations and the others in Figure 4 belong to a much large repertoire of facial movements that express more than one emotion category, and also non-emotional inner states, in a way that is tailored to specific situations and cultural contexts. The face is a powerful tool for social communication (). Facial movements, like reflexive and voluntary motor movements (), are strongly context-dependent. Recent evidence suggests the people’s categories for emotions are flexible and responsive to the types and frequencies of facial movements they are exposed to in their environments (Plate, Wood, Woodard, & Pollak, in press).

The degree of variation suggested by the published evidence also goes well beyond the hypothesis that the facial configurations in Figure 4 are prototypes or typical expressions, and that any observed variation are merely the result of cultural accents, display rules, suppression or other regulatory strategies, differences in induction methods, measurement error, or stochastic noise (as proposed by various scientists, including Elfenbein, 2013, 2017; ; Levenson, 2011; Matsumoto, 1990; Roseman, 2011; ). Instead, the facial configurations in Figure 4 are best thought of as Western gestures, symbols or stereotypes that fail to capture the rich variety with which people spontaneously move their faces to express emotions in everyday life. A stereotype is not a prototype. The distinction is an important one, because a prototype is the most frequent or typical instance of a category (Murphy, 2002), whereas a stereotype is an oversimplified belief that is taken as generally more applicable than it actually is.

The conclusion that emotional expressions are more variable and context-dependent than commonly assumed is also mirrored by the evidence from physiological changes (such as heart rate and skin conductance measures, Box 8, SOM) and even in evidence on the brain basis of emotion (Clark-Polner et al., 2017). The task of science is to systematically document these context-dependent patterns, as well as understand the mechanisms that cause them, so that we can explain and predict them. Clearly, the face is a rich source of information that plays a crucial role in guiding social interaction. Facial movements, when measured in a high dimensional dynamic context, may serve the diagnostic purpose that many consumers of emotion science are looking for (where context can be a cultural context, a specific situation, a person’s learning history or momentary physiological state, or even the temporal context of what just took place a moment ago; Barrett et al., 2011; Gendron et al., 2013).

A Note on the Scientific Literature

Our review identified several broad problems that lurk within the scientific research on facial expressions and that may cause considerable misunderstanding and confusion for consumers of this research. First, statistical standards are commonly adopted that don’t translate well for applying emotion research to other domains, applied or scientific. Showing that people frown when sad or scowl when angry with greater statistical reliability than would be expected by chance may be a scientific finding that warrants publication in a peer-reviewed journal, but above-chance responding is often low in absolute terms, making broad conclusions impossible, particularly for translation to domains of life where a person’s outcomes can be influenced by what emotional meaning perceivers infer. Making inferences based on statistical reliability without concern for specificity and generalizability is similarly problematic. Second, even studies that surmount these common shortcomings often have a mismatch between what is claimed in their conclusions (or in what others claim in reviews or citations of those primary research papers), and what inferences can, in fact, be supported by the results. Third, and relatedly, this mismatch often results from problems in how studies are designed—the particular stimuli used, the tasks used, and the statistical analyses are critically important and constrain what can be observed and inferred in the first place. Fourth, published research on emotional expressions and emotion perception often confounds the measurements made in an experiment with the interpretation of the data, referring without sufficient justification to facial movements as “emotional displays,” “emotional expressions” or even “facial expressions,” rather than “facial configurations,” “facial movements” or “facial actions”; referring to people “detecting” or “recognizing” emotion rather than “perceiving” or “inferring” an emotional state based on some set of cues (facial movements, vocal acoustics, body posture, etc.); and referring to “accuracy” rather than “agreement” or “consensus.”

A Note on Other Emotion Categories

Our conclusions most directly challenge what we have termed the “common view”: that a scowling facial configuration is the expression of anger, a nose-wrinkled facial configuration the expression of disgust, a gasping facial configuration the expression of fear, a smiling facial configuration the expression of happiness, a frowning facial configuration the expression of sadness, and that a startled facial configuration is the expression of surprise. By necessity, we focused on our review of evidence on these six emotion categories, rather than the more than twenty emotion categories that are currently being studied, because studies on these six are far more numerous than for other emotion categories. Nonetheless, some scientists claim that these other emotion categories each have distinctive, universal expressions, facial or otherwise, that is modified or accented by culture (e.g., Cordaro et al., 2017; Keltner et al., in press). In our view, such claims rest on evidence that is subject to the same critique as we offered for the research that we reviewed in detail here. In short, even though our review focused on the six emotion categories that are sometimes referred to as “basic emotions,” our observations and conclusions generalize to studies of other emotion categories that use similar methods.

Recommendations for Consumers of Emotion Research on Applying the Scientific Findings

Presently, many consumers of emotion research assume that certain questions about emotional expressions have been answered satisfactorily when in fact this is not the case. Technology companies, for example, are spending millions of research dollars to build devices to read emotions from faces, erroneously taking the common view as the one that is scientifically best supported. A more accurate description, however, is that their technology detects facial movements, not emotional expressions.41 Corporations like Amazon are exploring virtual human technology to interface with consumers. Virtual humans are used to educate children, train physicians, train the military as well as infer psychological disorders and perhaps eventually even be used to offer treatments. At the moment, the science of emotion is ill-equipped to support these initiatives. Emotional expressions are more variable and context-dependent than originally assumed, and most of the published research was not designed to probe this variation and characterize this context-dependence. As a consequence, right now, the scientific evidence offers less actionable guidance to consumers than is commonly assumed.

In fact, our review of the scientific evidence indicates that very little about how and why certain facial movements express instances of emotion is actually known at a level of detail that such conclusions could be used in important, real-world applications. To help consumers navigate the science of emotion, we offer some tips for how to read experiments and other scientific papers (Table 8).

Table 8:

Recommendations for reading scientific studies about emotion

1.Take note of whether an experiment is studying expressive stereotypes or more variable facial movements.
2.Take note of data on specificity and generalizability; do not focus solely on reliability at above chance levels.
3.Make a distinction between the data in an experiment (what was measured) and how those data are interpreted.
4.Translate “emotional expressions” or “emotional displays” into “facial movements.”
5.Translate “emotion recognition” into “emotion perception” or “emotion inference.”
6.Translate “accuracy” to “agreement,” “consensus” or “reliability.”
7.Give more weight to studies that measure facial movements or study the perception of facial movements made in more naturalistic settings.
8.Take note of studies that measure or manipulate context.
9.Field studies of people from small-scale, remote cultures are often less well-controlled than studies conducted in the laboratory, but they are invaluable in the information that they provide and should be valued.
10.Remember that emotions are not understood as internal states in all cultures. In some cultures they are understood as situated actions.
11.Do not skip the method and results sections and skip to the discussion to learn the results of an experiment. It is important to know what was measured and observed, not just how scientists interpret their measurements.

More generally, companies may well be fundamentally asking the wrong question. Attempts to simply “read out” people’s internal states from an analysis of their facial movements alone, without considering various aspects of context are at best incomplete, and at worst entirely lack validity, no matter how sophisticated the computational algorithms. These technology developments are powerful tools to investigate the expression and perception of emotions, as we discuss below. Right now, however, it is premature to use this technology to reach conclusions about what people feel based on their facial movements--which brings us to recommendations for future research.

Recommendations for Future Scientific Research

Specific, concrete recommendations for future research to capitalize on the opportunity offered by current challenges can be found in Table 9, but we highlight a few general points here. Foremost, the expressive stereotypes that summarize the common view, like those depicted in Figure 4, are ubiquitous in published research, but it’s time to move beyond a science of stereotypes to develop a science of how people actually move their faces to express emotion and the processes by which those movements carry information about emotion to someone else (a perceiver). (See Box 16 in SOM for a discussion of information theory as applied to emotional communication). The stereotypes of Figure 4 must be replaced by a thriving scientific effort to observe and describe the lexicon of context-sensitive ways in which people move their facial muscles to express emotion, and the discovery of when and how people infer emotions in other people’s facial movements.

Table 9:

Recommendations for future research

General Recommendations
▪ Take chances on studies that attempt to go beyond merely supporting traditional views of emotion.
▪ Support papers that attempt to study facial movements in real life, measuring context, sampling across cultures even though these studies are often less well controlled than studies in the laboratory, or may use facial stimuli that are less familiar to reviewers than canonical stimulus sets.
▪ Prioritize multidisciplinary studies that combine classical psychology methods with cognitive neuroscience, machine learning, etc.
▪ Support larger scale studies that bridge the lab and the world, that study individual people across many contexts, and measure emotional episodes in high dimensional detail, including physical, psychological and social features; encourage multiple investigators with different areas of expertise to work together.
▪ Support the development of computational approaches.
▪ Create R&D teams that pair psychologists and cognitive scientists trained in the psychology of emotion with engineers and computer scientists.
▪ Increase opportunities to test innovative methods and novel hypotheses, with the acknowledgement that such approaches are likely to elicit resistance from established scientists in the field of emotion.
▪ Generate more studies to identify the underlying neural mechanisms of the production and perception of facial movements.
▪ Direct funding to thornier but necessary new questions and be critical of projects that perpetuate past errors in emotion research.
▪ Direct healthy skepticism to tests, measures, and interventions that rely upon assumptions about “reading facial expressions of emotion” that seem to ignore published evidence and/or ignore integration of contextual information along with facial cues.
▪ Develop systematic, precise ways to describe and/or manipulate the dynamics of specific facial actions.
Stimulus Selection Recommendations
Limitations in stimulus selection can bias results.▪ For perception studies, incorporate images from the wild (e.g., from multiple internet sources) to capture the full range of facial movements that humans produce in their everyday lives.

▪ For both production studies (where stimuli are designated to evoke emotion) and perception studies, build variation into stimulus sets so conclusions about emotion categories are not inferred (or evoked) from limited stimuli. Consider randomly sampling a variety of stimuli for a given category and treating stimuli as a random variable.

▪ For production studies, ensure that multiple stimuli per emotion category are used to evoke an emotion.

Little is known about the dynamics of the production and perception of emotion signaling.▪ For perception studies, use dynamic images rather than rely on still images. For production studies, code the temporal dynamics of facial movements.

▪ Attempt to determine full dynamics and the apex of an emotion signal, changes to AUs as signals emerge and recede, and whether the kinematics of distinct AUs are similar or different across sequences or phases of emotion signaling.

▪ Ensure sufficient temporal resolution to allow for event segmentation to be assessed in perception studies.

The role of context is hotly debated, but rarely measured.▪ Manipulate (or at least measure) the context in which target stimuli are perceived to evaluate whether data are truly stimulus-specific or influenced by context features.

▪ Describe in a systematic way the differences in context, whether for production or perception studies. Theories about the effects of context cannot be resolved until we address how to measure and quantify context.

Sample Selection Recommendations
Cross-cultural studies can provide powerful insights, but are limited in number and scope.▪ Quantify, as best as possible, participants’ degree of exposure to the west, as well as the amount and type of formal schooling made available to participants.

▪ Harness technology to collect larger numbers of images and video sequences of facial movements across cultures. Use unlabeled classification approaches to discover emotion categories and their expressive forms, rather than continuing to ask whether other cultures are similar to the US. Remember that emotions and mental inferences may be understood differently in different cultures.

Task and Method Design Recommendations
Measurement versus interpretation of emotion is often blurred in research studies.▪ Contrast more than one “emotion” category with a baseline, so that conclusions about a specific emotion category are not drawn from a comparison of an emotion versus a no emotion condition.

▪ Compare multiple emotion categories to non-emotion categories in a given study.

New insights about emotion are constrained by reliance on, and assumptions about, traditional categories.▪ Measure emotional episodes in a multimodal way and attempt to discover explicit criteria for when an emotion is present or absent. Such discovery may require within-person approaches.

▪ Sample broader categories of possible emotion states than the limited categories used in prior research (move beyond categories such happiness, anger, sadness, fear, etc.). Test for variations in intensity within these categories and similarity across categories.

▪ Unless a study design is completely data-driven, explicitly state the theoretical priors of the research team. The distinction is between whether you are seeking to discover versus verify emotion categories. Both approaches are valid, but should be clearly articulated.

Data Analysis Recommendations
Findings are limited by a failure to consider issues related to forward and reverse inference.▪ Address issues of reliability and specificity when presenting data on emotion expression and emotion perception.

▪ Use formal signal detection analytics and information theoretic measures rather relying on frequency or levels of agreement. Consider using Bayesian methods so that the null hypothesis can be tested directly.

New research on emotion should consider sampling individuals deeply, with high dimensional measurements, across many different situations, times of day, etc.: a big data approach to learn the expressive repertoires of individual people. In the ideal case, videos of people in natural situations could be quantified by automated algorithms for various physical features such as facial movements, posture, gait, and tone of voice. To this we could add the sampling of other physical features such as ambulatory monitoring of autonomic nervous system changes to sample the internal milieu of people’s bodies as they dynamically change over time, ambulatory eye-tracking to assess gaze and attention, ambulatory brain imaging such as EEG and optical brain imaging (fNIRs). The failure to find reliable “fingerprints” for emotion categories stems, at least in part, from the same reason there are no reliable facial movements to express these categories: approaches have ignored meaningful variability due to context. There is also blue tooth technology to capture the physical spaces people inhabit (which can be quantified for various structural and social descriptive features such as how much light and noise they are exposed to), whether they are with another person, how that person reacts, and so on. Rich, multimodal observations could, in principle, be available from videos, which when time-synched with the other physical measurements, could be extremely useful in understanding the conditions for when certain facial movements are made and what they might mean in a given context. Naturally, big data in the absence of hypotheses is not necessarily helpful.

People could be offered the opportunity to annotate their videos with subjective ratings of the features that describe their experiences (whether or not they are identified as emotions). Candidate features are affective properties such as valence and arousal (see Box 9 in SOM). The features might also be appraisals as descriptions of how a situation is experienced (Barrett, Mesquita et al., 2007) and have the potential to add to the high dimensional characterization of what causes facial movements and what they mean.42 Such an approach introduces various technical and modeling challenges, but this sort of deeply inductive approach is now within reach.

Another opportunity for high dimensional sampling involves interactions with virtual humans. Because virtual humans can realize contingent behavior in rich social interactions under strict and precise experimental control, they can provide a richer, more natural context in which to study emotional expressions and emotion perception than is true for traditional laboratory studies, while not losing the experimental control that limits the causal inferences from ethological studies.

To date, this potential has not been exploited to explore the reliability and specificity in context-sensitive relations between facial movements and mental states. As we noted earlier, most of the systems are now designed to teach people a variety of skills, where the goal is not to assess how well participants perceive emotions in facial movements under realistic, socially ambiguous conditions, but instead to program expressive behaviors into virtual humans that will motivate people to learn the needed skills. In these experiments, the psychological realism of facial movements is often secondary to the primary goals of the experiment. A scientist might even program a virtual human with behavior or appearance that is un-natural or infeasible for a human (i.e., that are supernormal) so that a participant can unambiguously interpret and be influenced by the agent’s actions (Tinbergen, 1953; D. Barrett, 2007).

Nonetheless, the scientific approach of observing people as they interact with artificial humans holds great promise for understanding the dynamics and mechanisms of emotion perception and may get us closer to understanding human emotion perception in everyday life. Virtual humans are vivid. Unlike more passive approaches to evoking emotion such as viewing videos or images of facial configurations, a virtual human engages a human participant in a direct, social interaction to elicit perceptual judgments that are either directly reported or inferred from behaviors measured in the participant. Virtual humans are also highly controllable, allowing for more precise experimentation (Blascovich et al., 2002). A virtual human’s facial movements and other details can be repeated across participants offering the potential for robust and replicable observations. Numerous studies have demonstrated that humans are influenced by them (e.g., ; Krumhuber et al, 2007; McCall et al., 2009). For example, human learners are more engaged by virtual agents who move their faces (and modulate their voices), leading them (the real humans) to increased sense of self-efficacy (). As a consequence, virtual humans potentially “allow for the study of emotion in a rich virtual ecology, a form of synthetic in vivo experimentation” (). When combined with the high dimensional sampling we described earlier, there is the potential to revolutionize our understanding of emotional expressions by asking different questions than those encouraged by common views. Automated algorithms using data captured from videos offer substantial improvements with a data-driven, unsupervised approach. The result could be the robust descriptions about the context-sensitive nature of emotional expressions that is currently missing, and that would set the stage for a more mechanistic, causal account of emotions and their expressions.

An ethology of emotions and their expressions can also be pursued in the lab. Experiments can go beyond a study of how people move their faces in a single situation chosen to be most typical of a given emotion category. Most studies to date have been designed to observe facial movements in only the most typical situations. Future studies should examine emotional expression and perception across a range of situations that vary systematically in their physical, psychological, and social features, and aim to understand both the various ways that humans acquire the skills to express and perceive emotion, as well as the conditions that can impair the development of these processes.

The shift towards more context-sensitive scientific studies of emotion has already begun (see Box 3 in SOM), but it currently falls short of what we are recommending. Non-scientists (and some scientists) still anchor on the common view and only slowly shift away from it (; Wilson et al., 1996). The pervasiveness of the common view supports strong convictions about what it is that faces signal, and people often continue to hold to those convictions even when they are demonstrably wrong (Barrett, 2017a; Todorov, 2017). Such convictions reflect cultural beliefs and stereotypes, however. This state of affairs is not unique to the science of emotional expression or to the science of emotion more generally (Kuhn, 1962).

In our view, the scientific path forward begins with the explicit acknowledgement that we know much less than we thought we did, providing an opportunity to cultivate the spirit of discovery with renewed vigor and take scientific discovery in a new direction (Firestein, 2016). With this context of discovery comes the sobering realization that those of us who cultivate the science of emotion and the consumers who use this research should seriously question the assumptions of the common view and step back from what we thought we knew about reading emotions in faces. Understanding how best to infer someone’s emotional state or predict someone’s future actions from their facial movements awaits the outcomes of future research.

Supplementary Material


We thank Jose-Miguel Fernandez-Dols and James Russell for providing us with a copy of their edited volume before its publication. This article benefited from specific discussions with Linda Camras, Pamela Cole, Maria Gendron, Alan Fridlund, and Tuan Le Mau. We are grateful to Jennifer Fugate for her assistance with constructing Figure 4, and in particular for her FACS coding the facial configurations presented in Figure 4. We also send thanks to Sheri Widen who provided assistance on Box Figure 10-1. Many thanks also to Linda Camras, Vanessa Castro, Carlos Crivelli and David Cordaro for providing us with additional details of their published experiments. And thanks to Jeff Cohn for his guidance on acceptable inter-rater reliability levels for FACS coding. We are also deeply grateful to those friends and colleagues who invested their time and efforts in commenting on an earlier draft of this paper (although the summaries of published works and conclusions drawn from them reflect only the views of the authors): Linda Camras, Maria Gendron, Rachael Jack, Judy Hall, Ajay Satpute, and Batja Mesquita. The paper was supported by grants to from the U.S. Army Research Institute for the Behavioral and Social Sciences (W911NF-16-1-019), the National Cancer Institute (U01 CA193632) and the National Institute of Mental Health (R01 MH113234 and R01 MH109464) to LFB; the National Science Foundation (CMMI 1638234) to LFB and SM; NIMH grant 2P50MH094258 to RA; the National Institutes of Health (R01-DC-014498, R01-EY-020834) and the Human Frontier Science Program (RGP0036/2016) to AMM; NEI grant R56 EY020834 to AMM and LFB; the National Institute of Mental Health (R01 MH61285), the National Institute of Child Health and Human Development (U54 HD090256) and a James McKeen Cattell Fund Fellowship to SDP; and the Air Force Office of Scientific Research (FA9550-14-1-0364) to SM. The views, opinions, and/or findings contained in this paper are those of the authors and shall not be construed as an official U.S. Department of the Army position, policy, or decision, unless so designated by other documents.


AccuracyExtent to which a participant’s performance corresponds to the intended performance on an experimental task. Critically, this requires proper experimental task design, so that the intended correct performance is perceiver-independent, and not subject to the whims of the experimenter.
AffectA general property of experience that has at least two features: pleasantness or unpleasantness (valence) and degree of arousal. Affect is part of every waking moment of life and is not specific to instances of emotion, although all emotional experiences have affect at their core.
AppraisalScientists use the word “appraisal” either to describe how a situation is experienced (e.g., a situation is experienced as novel) or to refer to a literal cognitive mechanism that causes those features of experience (e.g., an evaluation or judgment of whether or not a situation is novel).
Approach/avoidanceA fundamental dimension of motivated behavior. It is different from valence, which is a dimension of experience rather than of behavior.
Category/CategorizationThe psychological grouping of a collection of objects, people or events that are perceived to be similar in some way. May be done consciously or unconsciously. May be explicit (as when applying a verbal label to instances of the grouping) or implicit (treating instances the same way or behaving towards them in the same way).
Choice-from-array tasksAny judgment task that asks research participants to pick a correct answer from a small selection of options provided by the experimenter. For example, in the study of emotion perception, participants are often shown a posed facial configuration depicting an emotional expression (e.g., a scowl), along with a small selection of emotion words (e.g., “angry,” “sad,” “happy”) and asked to pick the word that best describes the face.
Common viewIn this paper, the most predominant view about how emotions are related to facial movements. While difficult to quantify, we characterize it through examples, e.g., an internet Google search (Box 1, SOM). The common view holds that (a) certain emotions categories reliably cause specific patterns of facial muscle movements, and (b) specific configurations of facial muscle movements are diagnostic of certain emotions categories. See Figure 4.
Conditional probabilityThe probability that an event “X’ will occur given that another event “Y” has already occurred, or p(X/Y). If “X” is a frown and “Y” is sadness, then p(frown/sadness) is the conditional probability that a person will frown when sad. See also forward inference, reverse inference.
Configural (vs featural) perception of a faceThe visual analysis of something, like a face, that is holistic, meaning that the face is visually analyzed as a gestalt (or whole unit) that incorporates features and their relations. Featural processing means that individual features are perceived independently, without reference to one another.
Confirmation biasThe tendency to search for, remember, or believe evidence that is consistent with one's existing beliefs or theories, in favor of evidence inconsistent with one’s beliefs or theories.
Congenitally blindPeople who are born without vision. In the literature, there is considerable heterogeneity, with some people being truly blind from the moment they are born, but others having severe visual impairments short of complete blindness or becoming blind in infancy. If the cause is peripheral (in the eyes rather than the brain), such individuals may still be able to think and imagine very similarly to sighted individuals.
ConsistencyAn outcome that does not vary greatly across time, context, or different individuals (see forward inference). Consistency is not accuracy (a group of people can consistently believe something that is wrong).
DiscriminationIn psychophysics, to judge that two stimuli are different from one another; separate from identifying what they are (identification) or what they mean (recognition).
Ecological validityRefers to the extent to which the findings of a research study are able to be generalized to real-life settings; the extent to which an experimental protocol captures valid aspects of the real world (related to in-the-wild).
Emotional episodeA window of time during which there is an emotional instance. Often, but not always, accompanied by an experience of emotion, and sometimes, but not always, involves an emotional expression.
Emotional expressiona facial configuration, bodily movement, or vocal expression that reliability and specifically communicates an emotional state. Many so-called emotional expressions are in fact errors of reverse inference on the part of perceivers (e.g., an actor crying when not sad).
Emotional granularityexperiencing or perceiving emotions according to many different categories (e.g., low granularity = angry, sad, and afraid are all synonyms of “unpleasant;” high granularity = “frustration,” “irritation,” and “rage” are all distinct from one another and from “anger”.
Emotional instance (or instance of emotion)An event categorized as an emotion. For example, an instance of anger is the categorization of an emotional episode of anger. In cognitive science, an instance is called a “token” and the category is called a “type”. So, an instance of anger is a token of the category anger. (see Emotional episode).
Face inferiority effectA phenomenon observed in emotion perception studies of toddlers and young children. They have difficulty inferring the causes for emotions depicted in facial movements alone when compared to inferring the causes of emotions depicted with stories or words.
Facial affect coding system (FACS)A system to describe and quantify visible human facial movements.
Facial configurationA pattern of visible contractions of multiple muscles in the face; the production analog to configural perception of faces. Configurations can be described objectively (e.g., with FACS coding). Not synonymous with “facial expression”, which requires an inference about how the facial configurations were caused.
Facial expressionA facial configuration that someone infers is expressing an internal state. Facial expressions of emotion are configurations that perceivers reverse infer to have been caused by an internal emotion state; they are thus perceiver-dependent.
Facial movementA facial configuration that is objectively described in a perceiver-independent way. This description is agnostic about whether the movement expresses an emotion and does not use reverse inference. FACS coding is an example.
Forced-choice taskAn experimental task in which a participant must choose between options provided by the experimenter.
Forward inferenceInferring an effect from knowing its cause. An example would be the conditional probability of observing a frown given we know somebody is angry, p(frown∣anger).
Free labelingAn experimental task that is not forced-choice, but in which the participant generates words of her/his choosing.
GeneralizationThe replication of research findings across different settings, samples, or methods. Generalizability can be weak, for instance if a finding replicates to a limited extent, or strong, if it replicates across very different methods and cultures.
Habituation taskIn a habituation task, infants are repeatedly shown objects or images that belong to the same category. When subsequently shown a novel stimulus (one that is not experienced as similar to the others), infants look longer at it. Used to infer how infants categorize stimuli.
In the wildIn the real world (vs. in the lab). Related to ecological validity.
In-group advantageIn sociology and social psychology, an in-group is the social group of which a person psychologically feels they are a member; typically, people have more visual experience and familiarity with in-group members. In-group advantage refers to the often superior ability to perceive faces or voices from one’s in-group, as compared to from an out-group.
Mental inference/mentalizingAssigning a mental cause to actions; also sometimes referred to as “theory of mind”. The reverse inference of attributing emotions from seeing facial movements is an example of mentalizing.
Meta-analysisA method for statistically combining findings from many studies.
MultimodalCombining information from more than one of the senses (e.g., vision and audition).
Null hypothesisThe hypothesis or default position that there is no relationship between dependent and independent variables. The probability of observing results that support the null hypothesis is chance level, i.e. what would obtain if observations are random, or permuted. Consequently, if the null hypothesis is true, the distribution of p-values is uniform (every possible outcome has an equal chance).
Perceiver-dependentInterpretation of an observation that depends on human judgment. Perceiver dependency can produce conclusions that are consistent across people but not accurate or valid.
Perceiver-independentAn observation that does not depend on human judgment. Although some philosophers argue that all observations require some human judgment, there are degrees of dependency. Judging that a flower vase is rectangular or oval is relatively perceiver-independent, whereas judging whether it looks nice is perceiver-dependent.
Percent agreementA measure of agreement between raters; high agreement produces high inter-subject consistency. Percent agreement is not the same as percent accuracy, since the former is more perceiver-dependent than the latter.
Perceptual matching taskAn experimental task that requires research participants to judge two stimuli, such as two facial configurations, as similar or different. This only requires discrimination, not categorization, recognition, or naming.
PriorsBackground beliefs. In the context of Bayes's Theorem, the belief that a hypothesis is true depends not just on the evidence presented but also on the strength of prior beliefs. If a person has a strong prior, this may result in a confirmation bias.
PrototypeThe most frequent or most typical instance of a category. Distinct from stereotype: A group of people may have a perceiver-dependent stereotype that is an inaccurate representation of the prototype.
RecognitionAcknowledging something’s existence (which is confirmed to exist by perceiver-independent means). Contrast with perception (which involves inference and interpretation).
ReplicationThe extent to which new experiments come to the same conclusions as a previous study. Strong replications generalize well: similar conclusions are obtained even when the new experiments use different subject samples, stimuli, or contexts.
Reverse correlationA psychophysical, data-driven technique for deriving a representation of something (e.g., an image of a facial configuration) by averaging across a large number of judgments.
Reverse inferenceInferring a cause from having observed its purported effect. For instance, inferring that a scowl means someone is angry (the conditional probability, p(anger∣frown)). In general, reverse inference is poorly constrained, since multiple causes are usually compatible with any observation.
Sensory modalitiesThe different senses: vision, hearing, etc.
SpecificityResearch conclusions that include positive as well negative statements. For instance, concluding that a frown signals anger but that other facial movements do not signal anger, and that a frown does not signal emotions other than anger. High specificity helps make reverse inference valid. Ideally, research conclusions feature both high specificity for some domains, and high generalizability for others (e.g., that a frown signals only anger, but does so across all people and cultures).
Statistical learningDetecting statistical regularities from an environment; learning to recognize patterns.
Stereotypea widely held but inaccurate belief about a person or category.
UniversalSomething that is common or shared by all humans. The source of this commonality (innate or learned) is a separate issue. If an effect is universal, it generalizes across cultures.
ValidityWhether an observed variable actually measures what is claimed. E.g., whether a facial movement indicates an emotion (construct validity), or is specific for a particular emotion (discriminative validity).


1English does not contain gender-neutral pronouns. As a consequence, we alternate between male and female pronouns.

2Decades of research in social psychology shows that humans automatically try to predict other people’s behavior by inferring a mental state – this is called mental state inference or mentalizing, such as when inferring someone’s emotional state (e.g., for a review, see Gilbert, 1998). This research suggests that inference and prediction are not separate steps ().

3Bolded words appear in the glossary.

4To be clear, teaching children how to infer emotions in others is not a problem because this skill is related to efficient communication with others. The question is whether children are being taught information that is scientifically valid and generalizable.

5As of November 10, 2018, a website for the Detego Group indicated that “The methods developed (sic) by Paul Ekman are based on 40 years of research and are being taught to the FBI, CIA, Scotland Yard and more forensics specialists around the world” (http://www.detegogroup.eu/paul-ekman-introduction/?lang=en).

6This empirical emphasis is largely consistent with scientists’ explicit reports of what they believe, according to a recent survey from 2014. Two-hundred and forty eight scientists who published peer-reviewed papers on the topic of emotion were asked about their views on what the scientific evidence shows. Of the 149 (60%) who responded, 119 (80%) indicated that they believed compelling evidence exists for the hypothesis that certain emotion categories are expressed with universal facial configurations or vocal signals (Ekman, 2016); no questions about variability were included in the survey.

7In social psychology, this is the distinction between identifying an action and making an inference about the mental cause of the action (Gilbert, 1998; ).

8This corresponds to the null hypothesis for the true positive (in Figure 3).

9To test the specificity hypothesis, we test something called the false positive: that people frequently scowl when not angry, meaning that they scowl more frequently than chance when fearful, sad, confused, hungry, etc. (see Figure 3). Retaining the null hypothesis for the false positive, that people do not scowl more frequently than they would by chance when fearful, sad, confused, hungry, etc., is equivalent to rejecting the null hypothesis (i.e., finding support for) the specificity hypothesis. Rejecting the null hypothesis for the false positive, because people scowl when fearful, sad, confused, hungry, etc., in addition to when angry, is evidence of no specificity (i.e., retaining the null hypothesis for the test of specificity).

10Our decision to focus on the anger, disgust, fear, happiness, sadness and surprise categories was reinforced by two observations. First, consider a recent poll that asked scientists about their beliefs (Ekman, 2016). Two-hundred and forty eight scientists who published peer-reviewed papers on the topic of emotion were given a list of 18 emotion labels and were asked to indicate which, according to available empirical evidence, have been established as biological categories with universal expressions. Of the 149 (60%) who responded,

“There was high agreement about five emotions … : anger (91%), fear (90%), disgust (86%), sadness (80%), and happiness (76%). Shame, surprise, and embarrassment were endorsed by 40%–50%. Other emotions, currently under study by various investigators drew substantially less support: guilt (37%), contempt (34%), love (32%), awe (31%), pain (28%), envy (28%), compassion (20%), pride (9%), and gratitude (6%).” (Ekman, 2016, p. 32, italics added).

Second, there is no smoking gun in the published research on these additional emotion categories – that is, there are no scientific findings related to the production or perception of facial expressions for those emotion categories that thus far challenge the general conclusions of this paper. Simply put: regardless of how few or how many emotion categories we evaluated, the findings are the same.

11Different number of facial muscles are reported in various sources depending on how muscles are grouped or divided.

12From http://erikarosenberg.com/facs/ :”scientists often refer to a set of actions that occur on the face simultaneously as “facial events,” rather than calling them facial expressions. It is more descriptive. The word “expression” suggests that something from the inside becomes observable on the outside. Yet not every facial behavior expresses an internal state – most probably do not.”

13see https://how-emotions-are-made.com/notes/facial_action_coding

14Box 6 in SOM presents a summary of computer vision algorithms for automatically detecting facial actions.

15Changes in illumination and face orientation are currently major hurdles.

16Thirty-eight groups, each with their own face reading algorithm, announced their intention to participate in the challenge (Benitez-Quiroz et al., 2017a). Groups tuned their algorithms on the set of training images that were provided two weeks before the challenge deadline. Final evaluations were done on the testing set only. Of the original 38 groups, only four submitted results before the challenge ended.

17These accuracy levels might be considered an upper estimate because of the characteristics of the training and test image databases. The methods for choosing the database are described in Benitez-Quiroz et. al 2016), although we provide a few important details here: Note, however, that a number of images are posed and professional taken.

Some facial configurations are exaggerated. Under these idealized circ*mstances, manual verification of these faces was estimated at 81% accuracy.

18It is also possible that an individual person has a variety of probabilistic physical changes that reliably and specifically occur during the instances of a single emotion category, but for a number of reasons this hypothesis has not yet been scientifically tested. Specific studies to address this question would be very helpful.

19There are ways to get around this circularity by using unsupervised, data-driven methods to discover categories, but to date, studies have used supervised approaches where categories are prescribed by human inference.

20By relying on their own beliefs, scientists are using human consensus to identify when an emotional episode is occurring and which emotion category it belongs to (i.e., when they agree that fear or some other emotion is present, then it is said to be present). It’s important to realize that every single experiment dealing with emotion to date relies on human inference in this way. Consensus inferences are made in many areas of science. In physics and astronomy consensus emerges from expert scientists whose beliefs and assumptions often challenge the common sense view, such as in the case of quantum mechanics, dark matter, and black holes. In other areas of psychology, consensus is used to define many categories, such as memory and attention, as well as psychiatric categories, such schizophrenia and autism. Even defining depression as a mental vs. a physical illness is a matter of consensus rather than objective ground truth. But it is noteworthy that when it comes to emotions, scientists use exactly the same categories as non-scientists, which may give us cause for concern (as forewarned by William James; James, 1890, 1894). For example, compare the findings in Box 8 with the recent survey of scientists who study emotion (Ekman, 2016): 88 out of 149 scientists responded continue to believe that certain emotion categories have universal physiological markers, despite meta-analyses showing otherwise.

21These meta-analytic findings are consistent with an earlier summary published by Matsumoto et al. (2008): of the 14 studies using rigorous FACS coding by human experts, only five reported that participants spontaneously displayed some or all of the hypothesized AUs during emotions. This is in contrast to the nine studies using the less reliable EM-FACS coding, all of which reported support. These findings suggest that some type of perceptual bias creeps in when observers make configural judgments of whether an AU is present or not (e.g., indicating whether or not a participant is smiling, or displaying “happiness”) than when AUs are coded independently, one at a time.

22Remote, small-scale cultures are not untouched by western influences. All cultures have some minimal contact with western cultures (and this was also the case for the seminal papers published by Ekman and his colleagues in the 1970s; Gendron & Crivelli, 2017).

23The Trobriand Islanders are a different ethnic group than the Fore; Trobrianders are subsistence fisherman and horticulturalists living in a small archipelago of islands located 200km from the mainland (the origin of the original Fore who were photographed). As Crivelli et al. make clear in their paper, these findings are a within-nation rather than a within-culture comparison.

24The value of this particular study is that the researchers not only coded infants’ facial movements but also measured a range of concurrent movements that could support inferences about the infants’ feelings of pleasantness, unpleasantness and level of arousal, termed affect (see Box 9), including increased respiration, withdrawal/leaning away with the body, stilling/freezing, struggling, turning toward the mother, extreme withdrawal, hiding of their faces, squirming, self-stimulation, looking toward mother, pointing at the object, doing a “double-take,” and banging on the table.

25Bennett et al. (2002) note that when they observed facial actions were thought to be associated with more than one emotion category (e.g., when an infant produced a facial configuration that was a combination of scowling (anger) and pouting (sadness), they interpreted the expression using the facial actions in only the upper region of the face, which indicates that infants’ facial movements were even more variable than reported in the data tables. A footnote in the paper further indicates that infants produced facial movements that were interpreted to reflect “interest” across all of the eliciting situations, but these facial actions were not included in any data analyses (Bennett et al., 2002, footnote 1). Any facial configuration that included AUs stipulated as interest and AUs for another emotion category was coded as an expression of the other emotion category.

26Also, it is not clear that children find sour foods disgusting (e.g., ; Rozin et al., 1986).Young children appear to be attracted to many things that adults find disgusting, whereas by the age of five, children have more adult-like behavioral responses and reject them (Rozin et al., 1986). For a discussion of how disgust is learned, see .

27In another naturalistic study, videos of children aged four through seven were downloaded from the internet and FACS coded (Camras et al., 2018). The children were playing “the scary maze game”: a child solves maze after maze of increasing difficulty, only to encounter a screaming, demonic girl from the movie The Exorcist (filmed in 1973). The game is generally thought to evoke an instance of fear (hence the name “scary”), but it may also evoke surprise as the scary stimulus makes a sudden unexpected appearance. Children only produced the wide-eyed, gasping configuration (the proposed facial expression of fear) and/or a startled configuration (the proposed facial expression of surprise) with weak reliability (38% and 10%, respectively).

28By analogy, people who have been blind since birth learn color concepts and the relation between these concepts, such as “red,” “blue,” and “green” are similar to those of sighted people (e.g., congenitally blind individuals understand the US concept for "blue" is more similar to "green" than to "red”; ). The structure of brain regions in visual cortex that represent visual concepts are also virtually indistinguishable in sighted and congenitally blind individuals (Koster-Hale et al., 2014; Wang et al., 2015).

29The onset and severity of blindness varies hugely across studies. Even a small amount of visual experience in infancy or early childhood will influence brain development and provide experiences for learning about emotions (see earlier section on emotion concept development in infants). Helen Keller, for example, could see and hear until she was 19 months old, providing some initial scaffolding for her later ability to communicate.

30For example, recently, Ekman (2017) wrote, “Another challenge to the findings of universality came from the anthropologist, Margaret Mead …. Establishing that posed expressions are universal, she said, does not necessarily mean that spontaneous expressions are universal. I replied (Ekman, 1977) that it seemed illogical to presume that people can readily interpret posed facial expressions if they had not seen those facial expressions and experienced them in actual social life” (Ekman, 2017, p. 46).

31While these findings are instructive, they likely provide a lower limit of the possible real world variation in the facial configurations that express the varied instances of a given emotion category. After all, the internet is a curated version of reality and some frequent facial configurations are likely missing because they are rarely uploaded to the internet. Similarly, some configurations commonly found on the internet might not be commonly observed in the real world.

32Compare these findings to those from a study that mined images from the internet using a similar but narrower approach, and who had two raters use a choice-from-array method to label the images (Mollahosseini et al., 2016).

33Configuration 3 also resembles people’s beliefs about the configurations that express fear and awe (i.e., the “international core patterns” reported by Cordaro et al. 2017).

34More generally, participants are more likely to perceive the intended emotion in the hypothesized facial configurations of Figure 4 when they are displayed on dynamically moving, synthetic faces (), in video footage of posed facial muscle movements (e.g., ; ), and even in point-light displays of motion created by facial muscle movements (Bassili, 1979). This “dynamic advantage” sometimes disappears when participants are viewing real human faces (e.g., ; Gold, Barker, et al., 2013; ; ).

35 was chosen as one of the forty studies that changed psychology (Hicks, 2012) and, along with Ekman et al. (1969) is routinely discussed in introductory psychology textbooks.

36Dioula participants from Burkina Faso in West Africa showed strong reliability for labeling smiling facial configurations as happiness, moderate reliability for labeling frowning facial configurations as sadness, startled facial configurations as surprise, and nose-wrinkled facial configurations as disgust, and weak reliability for labeling scowling facial configurations as anger and wide-eyed gasping facial configurations as fear.

37For example, a sample of Trobriand Islanders, who are subsistence horticulturalists and fishermen living in the Trobriand Islands of Papua New Guinea, labeled a scowling facial configuration as anger with above chance reliability (.29% of the time), but also labeled that facial configuration more frequently with “feels like avoiding a social interaction” (.50% of the time) (Crivelli, Russell et al., 2017, Study 2). In fact, the wide-eyed, gasping facial configuration that is thought to be the expression for fear (Figure 4) is understood as an expression of aggression or threat in the Trobriand culture (, 2017; Crivelli, Russell et al., 2016). Trobrianders uniquely labeled smiling facial configurations as happiness across two studies but this finding did not replicate in a third sample nor in a sample of Mwani participants who are subsistence fisherman living on Matemo Island in Mozambigue, Africa.

38The ancestors of the Hadza are thought to have been continuously practicing a hunting and gathering lifestyle for at least the past 50,000 years in their current region of East Africa. Furthermore, Hadza social structure, mobility, residential patterns, and language have thus far remained largely buffered from their interactions with other ethnic groups (; ) which have been sustained for at least the past 100 years (Jones, 2016)

39The wide-eyed gasping stereotype for fear is thought to have evolved for enhanced sensory sampling that supports efficient threat detection (Susskind et al., 2008). Similarly, the nose-wrinkle stereotype for disgust is thought to have evolved in order to limit exposure to noxious stimuli (Chapman & Anderson, 2012; ).

40Interestingly, adult perceivers may have overtly looked at the postures less, but other evidence with the same stimuli suggest that different body contexts influenced how adult participants visually scanned the exact same facial configurations; Aviezer et al., 2008). At the other end of the age spectrum, older adults are also more influenced by context when inferring emotional meaning in facial configurations as compared to young adults ().

41Some applications will not be affected by context because they are not aiming to use facial movements to infer an individual’s underlying emotional state. These initiatives have very specific applications in mind. For example, detecting pain in patients (Apple), driver drowsiness (Google), creating virtual facial expression stickers or animojis from one’s own facial poses (Facebook, iPhone X), or Alibaba's “smile to pay.”

42The word “appraisal” has two meanings in the science of emotion. Here, appraisals simply to refer to the descriptive features of how a situation is experienced, such as novelty, goal relevance, etc., without any inference about how those experiential features are caused (e.g., ; Ortony & Clore, 2013). The other meaning of appraisal refers to the mechanisms that cause the experiential features as components of emotion (e.g., the component process model of emotion, in which appraisals are considered evaluative “checks” that the human mind uses in a serial fashion; e.g., ). There is very little evidence that appraisals are, in fact, causal in nature (for a discussion, Parkinson, 1997). In some studies, for example, participants are presented with a written scenario that is assumed to automatically trigger a specific sequence of appraisal checks (i.e., cognitive evaluations), which in turn is hypothesized to produce a specific pattern of facial muscle movements. Notice that the main causal mechanisms here – appraisal checks – are not measured directly but are inferred to have occurred. In other studies, participants are asked to explicitly report on the appraisals they experience, on the assumption that the corresponding “checks” are active. Emerging scientific evidence links appraisals, as descriptive features, to facial movements, although the evidence to date suggests that these relationships are not as consistent as specific as hypothesized (a summary of this research program can be found in in ).


