Visual Rhetoric and Semiotic

Summary and Keywords

Visual Rhetoric (VR) is a field of inquiry aiming to analyze all kinds of visual images and texts as rhetorical structures. VR is an offshoot of both visual semiotics, or the study of the meanings of visual signs in cultural contexts; and of the psychology of visual thinking, as opposed to verbal thinking—defined as the capacity to extract meaning from visual images. The basic method of VR, which can be traced back to Roland Barthes’s pivotal 1964 article “The Rhetoric of the Image,” is to unravel to connotative meanings of visual images. The picture of a lion, for instance, can be read at two levels. Denotatively (or literally) it is interpreted as “a large, carnivorous, feline mammal of Africa.” This level conveys informational or referential meaning. But the image of lion in, say, an advertisement or music video invariably triggers a connotative sense—namely, “fierceness, ferociousness, bravery, courage, virility.” The key insight of VR is that connotation is anchored in rhetorical structure, that is, in cognitive-associative processes such as metaphor and allusion, which are imprinted not only in verbal expressions, but also in visual images. So, the image of a lion in, say, a logo design for men’s clothing would bear rhetorical-connotative meaning and affect the way in which the clothing brand is perceived. This same basic approach is applied to all visual expressive artifacts, from traditional visual art works to the design of web pages and comic books. VR is showing that visual objects are rhetorical objects and that, therefore, they can be used to influence and persuade people as effectively as rhetorical oratory, if not more so. Given its simple, yet effective method of analysis, VR is spreading to various disciplines as a technique, including psychology, anthropology, marketing, and graphic design, among many others, affirming how visual images tap into a system of symbolism that is interconnected with other forms of symbolism and representation.

Keywords: semiotics, rhetoric, visual thinking, visual image, visual text


Visual rhetoric (VR) is the critical analysis of visual texts (paintings, movies, ads, posters, and so on) with the techniques of both semiotics and rhetorical analysis. The former is the discipline that studies signs (any form that has meaning), and rhetoric is the discipline that examines the structure and uses of figurative language (metaphor, metonymy, catachresis, irony, and so on). In addition, it has extended the traditional view of rhetoric to include the influence or persuasive force of images rather than with their structure. VR scholars may analyze the structure of an image (in the content of language or visually), but do so with an eye toward rhetorical consequence—who is persuaded and how and to what ends.

VR is now a branch, or more accurately a subfield of anthropology, art theory, psychology, graphic design, marketing, communication, literary analysis, and culture studies. Its basic focus is on the visual processing of forms and their meanings, and on how to read (interpret) visual texts such as ads and films. Visual thinking is the phenomenon of forming thoughts in terms of mental and real-world images, rather than words and their meanings. It has been characterized as the process of perceiving ideas as a series of mental pictures. Phillip Yenawine (1997, p. 845) defines it as “the ability to find meaning in imagery”:

It involves a set of skills ranging from simple identification (naming what one sees) to complex interpretation on contextual, metaphoric and philosophical levels. Many aspects of cognition are called upon, such as personal association, questioning, speculating, analyzing, fact-finding, and categorizing. Objective understanding is the premise of much of this literacy, but subjective and affective aspects of knowing are equally important.

Some visual rhetoric scholars believe that images work less through cognition and more through affect, emotion, and embodiment—that is, images are processed through feeling before they are understood at a cognitive level. The psychological and social importance of this form of understanding became saliently obvious after the publication of three influential works: Roland Barthes’s article, “Rhetoric of the Image” (1964), Rudolf Arnheim’s book, Visual Thinking (1969), and Jonathan Berger’s book, Ways of Seeing (1972). It was Arnheim who actually coined the term visual thinking. All three scholars argued persuasively that visual images and texts (drawings, shapes, pictures, and so on) conveyed as much information and cultural nuance, if not more so, than verbal texts. A few years later, research by psychologist Eleanor Rosch on mental images suggested that they were not just a result of perceptual mechanisms but also a product of cultural conditioning (Rosch, 1973, 1975, 1981). The empirical work of Abigail Housen, starting in 1993 (see Housen, 2002), also showed that visual thinking was the likely basis for developing critical thinking and its transfer to other skills and content. But already in the 1920s and 1930s, psychologists like Jean Piaget and Lev Vygotsky were claiming that pre-verbal children processed information primarily through visualization (Piaget, 1923, 1936; Vygotsky, 1931, 1962, 1978). By the late 1980s and early 1990s, this confluence of work on visual thinking led to the emergence of visual semiotics as a major branch of semiotics (Saint-Martin, 1990; Santaella-Braga, 1988; Sebeok & Umiker-Sebeok, 1994; Sonesson, 1989; Trifonas, 1996). It is from the work in this field that VR developed gradually as an autonomous discipline by the turn of the millennium (Handa, 2004).

The link between visual semiotics and VR is evident to this day. A basic premise of the latter is, in fact, a virtual “law” of the former—namely, that the meaning and interpretation of visual images vary along cultural lines (Lotman, 1991; Uspenskij, 2001). Even the actual type of image that people will call to mind is guided by cultural factors (Taylor, 1995). When asked to visualize a triangle, for example, people living in Western culture will tend to envision the equilateral triangle, perceiving it to be exemplary or representative of the triangle form itself. Obtuse-angled, right-angled, and acute-angled triangles are perceived, instead, to be subtypes. The reason for this reaches back into the meanings of triangles as both geometrical and aesthetic constructs in ancient Greece—meanings carried over into Renaissance geometry and art where symmetry and perfection of form were praised. The equilateral triangle as a cultural prototype is the result of this tradition, working its way into groupthink through representational practices.

Rhetoric of the Image

If one were to pick a starting point for visual semiotics (and later VR), it is probably Roland Barthes’s pivotal article, “The Rhetoric of the Image” (1964). Barthes started by noting that the word “image” derives from a Latin term meaning “imitation,” posing the question of how something that is an imitation of something else can be so imbued with meaning. He used an ad for Panzani pasta to address this question (see Figure 1).

Visual Rhetoric and SemioticClick to view larger

Figure 1. Ad for Panzani pasta.

Barthes started his analysis by identifying different levels of meaning in the ad. First, there is the level of denotation consisting of the brand name and logo design. This has pure informational value—it allows viewers to recognize the pasta, should they desire to buy it. However, the name also works at a different semiotic level, since it assigns an aura of “Italianicity” to the whole ad text. Connected to this is the symbolism of the tomato and the other ingredients visible in the ad, which implies Italian cuisine and its supposed superiority—a meaning reinforced by the caption (“à l’italienne de luxe”). This whole system of meanings occurs at the level of connotation, which constitutes a powerful unconscious rhetorical system as Barthes had also argued in his 1957 book, Mythologies. The system is called a code, into which the ad taps through its visual image (and accompanying caption). Barthes called the initial denotative reading of the ad as “non-coded” and the connotative one as “coded.” He referred to the ways in which the images and caption led the viewer to the coded meaning as anchorage.

This seemingly simple semiotic analysis—denotation (non-coded meaning)-versus-connotation (coded meaning)—has been criticized on several counts, such as ignoring the fact that the ad can be understood across cultures in ways that do not involve meaning dichotomies (Kress & Van Leeuwen, 1996). But the main point of Barthes’s article was that visual images bear more meaning than literally meet the eye (pun intended). They have, in other words, rhetorical force (Beasley & Danesi, 2002). A more contemporary VR analysis of the ad would focus directly on the rhetorical structures that lead to the embedded connotative code. For instance, the tomato is an allusive device, referring to the code without specifying it. It is also a metaphor standing for the main ingredient in Italian pasta dishes. In other words, the kinds of rhetorical strategies that are intrinsic to verbal texts can be found in the ad. The essence of VR analytical method today is a decoding of the rhetorical structure of visual texts so that unconscious cultural codes in them can be unraveled.

Barthes’s article marked a critical turning point for the subsequent direction of semiotics, bringing out the signifying power of the visual image. Shortly thereafter, advertising became a major target of analysis both within semiotics and cognate disciplines such as anthropology and psychology. It was instantly evident to those fields that the visual images of ads were effective because, as Bachand (1994, p. 134) observed shortly thereafter, they provide “an opportunity for varied aesthetic experience,” and this experience taps into unconscious forms of visual thinking that guide the mind’s interpretive faculties inexorably.

Already in the 1950s, however, Jacques Derrida was developing a parallel approach to textual analysis, called deconstruction, that years later culminated in his view that the modality of writing influenced cognition more forcefully than did any other modality (see Derrida, 1976). The main argument in deconstruction is that the meaning of a text cannot be determined in any absolute way because it shifts according to who reads it, when it is read, and what critical theory or philosophy is involved in the reading. Every text has built-in assumptions that come from historical writing practices and conventions. Derrida rejected the traditional way that critics interpreted literature as a mirror of life and the view that the author of a work was the source of its meaning. Deconstruction is located under the more general rubric of poststructuralism, associated not only with Derrida but also with another French philosopher, Michel Foucault (1972). The movement wanted mainly to show that signs do not encode reality, but rather construct it. Derrida was fixated with logocentrism—the view that knowledge is constructed by linguistic categories and that these are built on rhetorical practices that literally lead nowhere. From this line of reasoning, Barthes introduced his notion of the rhetorical image and the methodological approach to the meanings of visual texts as oscillating back and forth between the rhetorical and visual-denotative levels.

Visual Semiotics

Although Barthes was a semiotician, his ideas did not catch on broadly within semiotics at first. It was the late 1980s when a distinct branch of the discipline inspired by his work—visual semiotics—emerged. Its aim is to study all kinds of visual images in terms of their implications not only for general sign theory, but also for the psychology of visual thinking generally. Visual texts (cinema, magazines, ads, optical illusions, diagrams, charts, and so on) became major targets of analysis. Visual images were analyzed as special kinds of signs, that is, as signs meant to be seen, rather than heard or read verbally. Among the first works in the field one can mention those by Sonesson (1989, 1994), Saint-Martin (1990), and Sebeok and Umiker-Sebeok (1994). An early classificatory framework of visual signs is the one by Santaella-Braga (1988), which built implicitly on the work of Group µ (1970)—a group of scholars and scientists interested in the rhetorical structure of all signs. From the outset, visual semiotics overlapped with both the study of visual communication in anthropology and of visual thinking and mental imagery in psychology. The overlap, however, had a basis in the history of the discipline. A 19th-century founder of modern-day semiotics, Ferdinand de Saussure (1916, p. 16), had used the word image in his theory of the sign, claiming, in fact, that semiology (his term for the discipline) was to be considered a branch of psychology (itself an emerging field at the time). Saussure defined the sign as a binary structure, consisting of a physical form, which he called the signifier, and a mental part, which he called the signified. He defined a verbal signifier, such as the word cat, as a “sound image” (a sequence of distinct sounds) and its signified (a type of mammal) as the “conceptual image” that the signifier calls to mind. Although he did not define a visual sign, by extension it can be characterized as a “visual image” at the level of the signifier and as a “conceptual image,” analgous to the one evoked by a verbal signifier, at the level of the signified. The latter corresponds, roughly speaking, to the psychologist’s term “mental image” and the former with any picture or viewable image.

At first, semioticians used a basic Saussurean framework for carrying out the analysis of visual signs and visual texts. Their basic technique was to identify the visual signs of a text and their connotative meanings via a consideration of visual signifiers such as points, lines, shapes, colors, and other visual forms. With the spread of Peircean semiotics in the late 1980s into the mainstream, semioticians started moving somewhat away from the Saussurean orientation and adopting insights from Peirce’s sign theory. Along with Saussure, Charles S. Peirce is a founder of modern-day semiotics (his views of the sign are found scattered throughout his writings; Peirce, 1931–1958). What made Peircean theory particularly attractive was his insight that our sensory and emotional experiences of the world influence how we create and understand signs—a process that he called semiosis. So, we construct a sign not only because we want to refer to something for practical purposes, or to classify it within some useful category, but also because we wish to interpret the world in specific sensory-affective ways. Especially useful is Peirce’s tripartite typology of signs—the icon, the index, and the symbol. Icons are signs that resemble their referents in some way. Photographs; portraits; and Roman numerals such as I, II, and III are visual icons because they resemble their referents visually. Indexes are signs that involve relation of some kind; that is, they are designed to put referents in relation to each other, to sign users, or to the context in which they occur. An example is the pointing index finger, which someone might use to indicate and locate things, people, and events with respect to himself or herself. Symbols are signs that stand for something in conventional, or conventionalized, ways. For example, the cross figure may stand for “Christianity” and the V-sign for “peace.” These are meanings that the signs have inherited from cultural traditions or historical events.

The study of signs in Peircean terms opened up the field of visual semiotics considerably, given that iconicity, indexicality, and symbolism are all modalities that visual images evoke and which lead to their coded interpretation. Visual signs can also be classified according to value, color, and texture, each of which can have iconic, indexical, or symbolic modalities. Value refers to the darkness or lightness of a line, shape, or entire text. The darkness-lightness dichotomy forms a semiotic opposition, whereby each one assumes contrasting connotative value—lightness generally connotes positive culturally-based values and darkness an opposite array of negative ones. Saussure (1916, pp. 251–258) defined valeur (value) as the minimal meaning we extract from oppositions—a minimal difference in sound, a minimal difference in tone, a minimal difference in orientation, and so on. Rather than bearing intrinsic meaning, Saussure argued that signs had valeur in differential relation to other signs or sign elements. To determine the value of an American quarter, for instance, one must know that the coin can be exchanged for a certain quantity (a substance) of something different and that its value can be compared with another value in the same system, for example, with two dimes and one nickel. Color is a sign system itself that is used in visual texts to convey mood, feeling, and atmosphere and is thus a constant source of rhetorical nuances. Finally, texture refers to the sensory or emotional experiences that certain visual forms evoke—wavy lines tend to elicit pleasant sensations, whereas angular ones do not.

Today, visual semiotics has become a key analytical tool in the study of online representational practices, which are increasingly multimodal (involving visual, audio, and verbal expressive modalities). Digital signs such as the emoticon (literally an icon that conveys an emotion visually) and the emoji have captured the attention of semiotic analysis because of the suggestive ways in which they relay specific types of emotional information visually (Danesi, 2016). In a phrase, the study of multimodality in online contexts and in all forms of digital communication is becoming a major area of semiotic investigation (see, e.g., Cattuto, Loreto, & Pietronero, 2007; Huang & Chuang, 2009; Ma & Cahier, 2014; Warschauer & Grimes, 2007).

Visual Images

Defining visual images poses a logical problem—one cannot really define them without some use of circular reasoning. Like an axiom in mathematics, therefore, it is assumed that the concept of image is self-evident. Images can be external, as the visual signs used above in the Panzani ad, or they can be mental (conceptual images). By and large, psychologists describe the latter as internal representations of real or imaginary things, allowing people to recall, plan for, and predict ideas, events, and so on (Kosslyn, 1983, 1994). They are more precisely called visual percepts. But the internal and external visual forms are interconnected semiotically and psychologically. Psychology has documented extensively that visual perception is not monolithic; it varies considerably among individuals. For example, some are better than others at moving objects around in their heads. They can easily visualize, say, the letter N changing into a Z when rotated in their minds. Others have greater difficulty in envisioning such imaginary rotations. Overall, people can imagine faces and voices, locate imaginary places, scan game boards (like a checker board), and arrange furniture in their minds, but they do so in differentiated ways. Culture also plays a conditioning role in mental imagery. For example, as discussed above, when asked to visualize a triangle, the equilateral triangle is the image tht comes to mind in people who have been exposed to classical Greek geoemetry. Analogously, the image of a cat that comes typically to the mind of subjects living in Western culture is that of a household cat, because it is the most typical cat in that culture.

Mental images are not generated only via visual perception. They can be elicited through other sensory and affective modalities, by imagining such phenomena as the sound of thunder (auditory image), the feel of wet grass (tactile image), the smell of fish (olfactory image), the taste of toothpaste (gustatory image), the sensation of extreme happiness (emotional image), and so on. Moreover, mental images can be evoked purely through imaginative processes. An example is a “winged table.” Although no such thing exists in real life, we nonetheless have no problem imagining it by simply verbalizing it. Another type of visual thinking is occurs in the form of narrative images, which unfold like a story. For example, we recall encounters with people in narrative ways, with different images representing different episodes of the encounter in a story-like fashion.

The branch of linguistics, called commonly cognitive linguistics, has introduced the notion of image schema into the psychological study of visual thinking (Johnson, 1987; Lakoff, 1987; Lakoff & Johnson, 1980, 1999). The image schema is defined as an unconscious outline of a recurrent shape, action, dimension, orientation, object, and idea that guides how we conceptualize abstractions. Consider an impediment. This is anything, such as a wall, a boulder, or another person that blocks forward movement. Experience informs us that we can go around the impediment, over it, under it, through it, or else remove it and continue on. On the other hand, the impediment could obstruct forward movement, meaning that we would have to stop and turn back. All of these actions can be easily imagined in the mind. They are image schemata that constitute the basis of common expressions such as the following: “We got through that difficult period”; “She felt better after she got over his cold”; “You might want to steer clear of financial debt”; “With most of the work out of the way, I was able to call it a day”; “The rain stopped us from enjoying our picnic”; “You cannot go any further with that plan; you’ll just have to turn back”; and so on. These make sense to us because they are based on the image schema of an impediment, which transforms specific physical experiences associated with impediments into abstract ones via metaphor.

Visual images can also be external, as mentioned. These are essentially visual signs made up of visual signifiers. An example of the latter is color, which has been the subject of extensive study within several fields (Berlin & Kay, 1969; Davidoff, 1991; Hardin & Maffi, 1997; Hatcher, 1974; Hilbert, 1987; MacLaury, 1997; Tufte, 1997). At a perceptual (denotative) level, we interpret colors as gradations of hue on the light spectrum. Hue is the property that leads us to give a color its name—for example, red, orange, yellow, green, blue, or violet. But the naming process is hardly free of cultural-historical factors and practices. The actual color terms we use in English predispose us to see differential categories of hue. Psychologists estimate that we can distinguish perhaps as many as 10 million hues. Clearly, then, our limited number of color terms is far too inexact to describe all the hues we are potentially capable of perceiving. So, each culture comes up with the color terms it needs to describe its particular interpretation of reality. Some cultures need very few; others need many more. The restrictions imposed on color perception by color vocabularies (signs systems) are the reason why people often have difficulties trying to describe or match a certain color. It is also the reason why we use figurative language and other kinds of semantic strategies to expand the color lexicon—pea green, sky blue, maroon, burgundy, and so on.

Throughout the world colors are used for representational purposes and thus tap into connotative codes of meaning. The archeological record strongly suggests, incidentally, that sensory and emotional meanings attached to colors may even have been the source for the color terms themselves (Wescott, 1980). In Hittite, for instance, words for colors initially designated plant and tree names such as poplar, elm, cherry, oak, etc.; in Hebrew, the name of the first man, Adam, meant “red” and “alive,” likely alluding to the importance of blood in sustaining life. Still today, in many languages red connotes “living” and “beautiful.”

Visual Texts

For the present purposes, a text can be defined simply as a composite semiotic form, that is, as a form that has been constructed to represent something by combining “smaller” sign elements or signifiers in some structured way (Sebeok & Danesi, 2000). Texts include diagrams, charts, conversations, poems, myths, novels, television programs, paintings, scientific theories, musical compositions, websites, and the like. Texts are not constructed or interpreted in terms of the individual meanings of their constituent parts added together, however, but holistically as singular signifying structures. Thus, a piece of music is not processed by listeners (or performers) as individual notes coming separately together in the mind, but rather as a holistic composition. Visual texts such as paintings, too, are not experienced as the summation of individual visual elements such as colors, shapes, portraiture sketches, and so on, but, again, holistically. However, the individual elements (single notes, visual cues, and so on) do guide (or constrain) the holistic interpretation, indicating that there is an intrinsic “bimodality” in the processing of texts, whereby the individual elements are the specific elements that constitute the overall text. Showing how this bimodality works in visual texts is a central objective of both visual semiotics and VR.

Consider a few figures that can be made with simple geometrical forms. Among other things, three straight lines can be joined up to iconically represent a triangle, the letter “H,” or a picnic table. These are three different texts (forms) constructed with the same three elements. The way the elements are joined—known as the syntax of the text—leas to its specific form—a triangle, a letter, a table—and thus its meaning. Other examples of bimodlity include a circle with two dots for the eyes and one for the nose representing a face. The smiley and winky emoji are constructed and interpreted in this way. Two types of visual texts that have become targets of great interest within visual semiotics are diagrams and charts, since these show clearly how bimodality works (Roberts, 2009; Stjernfelt, 2007). The former are schematic drawings using basic visual elements (points, lines, shapes, and so on) to show how something works or to clarify the relationship between the parts of a whole; the latter is a type of drawing designed to contain and display information. The common use of diagrams and charts throughout the world, and especially in mathematics and science, suggests that visual bimodality is a dominant one in knowledge-making systems. The early diagrams of the atom as a miniature solar system with a nucleus and orbiting particles was a de facto theory of the atom, allowing scientists to envision it in a particular way. In the history of science, this kind of visual thinking has produced truly remarkable results, since it has allowed for experimentation with reality to occur in the imagination. The results of this experimentation can then be redirected to the real world to see what it yields in real terms.

There are many and varied types of visual texts. Some of these, such as comic books, involve the use of both verbal and visual elements. These can be called “blended” or “hybrid” texts (Danesi, 2016). But it must always be kept in mind that visual representational practices are not universal; they are embedded in cultural contexts. Anthropologists found in the past that people living in societies that had never been exposed to illustrated magazines were unable to recognize the photographs in the magazines as images of human beings or real-world objects (Deregowski, 1982; Dunning, 1991). They tended to perceive them, rather, as smudges on the paper. Such interpretations of the photographs are not due to defects of intelligence or eyesight; on the contrary, the individuals were clear-sighted and highly intelligent. Their primary assumptions were different, because they had acquired a different semiotic system of visual bimodality that blocked them from perceiving the photographs in the same manner as people accustomed to viewing magazines.

Visual semiotics has been instrumental in providing theoretical frameworks for studying and understanding how visual bimodality works psychologically and socially, from the visual arts to advertising and web design (Bogdan, 2002; Crow, 2010; Dillon, 1999; Jappy, 2013; Moriarity, 2005; Tomaselli, 2009; Uspenskij, 2001; Zantides, 2014). But the area where its greatest interest has lain is cinema, which blends various textualities, from the visual and narrative to the musical and performative. Film-makers themselves have experimented with cinematic textualities and how the visual image can drive the narrative by itself. The classic example of this is Godfrey Reggio’s 1983 film Koyaanisqatsi—a visual essay on the state of the world in an age of encroaching technology and automation. The film has no characters, plot, dialogue, or commentary; it is a visual pastiche of jarring and disturbing images of cars on freeways, atomic blasts, litter on urban streets, people walking about mindlessly, decaying housing complexes, buildings being demolished, and other dystopian scenes of the modern world. Reggio incorporates the mesmerizing music of Philip Glass to act as a guide for connecting the images syntactically. The composer’s slow rhythms tire us with their lugubriousness, and his fast tempi—which accompany a chorus of singers chanting in the background—assault our sense of balance. When the visual-musical frenzy finally ends, we feel an enormous sense of relief. The movie starts and ends with images of a vastly different world—the world of the Hopi peoples of the Southwestern United States, a world based on a holistic view of humans interacting with nature, not destroying it. Glass’s choral music in these two segments is imitative of Gregorian chant, standing in contrast to the cornucopia of dissonant images and sounds connected with the modern world. The only segment of the movie that uses language is the conclusion, which is a cautionary declaration projected onto the screen: “Koyaanisqatsi (from the Hopi language)—crazy life, life in turmoil, life out of balance, life disintegrating, a state of life that calls for another way of living.”

VR Analysis

Needless to say, the primary investigative object of VR is the visual text. So, its analytical reach spans all visual media, from the visual arts and advertisements to websites, social media, apps, blogs, viral videos, and so on. VR provides critical insights into how rhetorical structure affects us emotionally and cognitively. As mentioned, it emerged as an autonomous field at the start of the 2000s. At first, it had a pedagogical and ethical orientation—to show students (in particular) the power of visual persuasion in a world increasingly based on visual images and communication (Benson, 2015; Gries, 2015; Handa, 2004; Hill & Helmers, 2004; Olson, Finnegan, & Hope, 2008). But it has since expanded to encompass the critical analysis of all forms of visual representation. As mentioned, VR got started from Barthes’s observation that we read images at a connotative level and thus rhetorically. Its goal is not to just illuminate the structure of a visual text but to show its ethical, social, political, and ideological functions.

One of the first productive uses of VR was in the area of advertising, recalling Barthes’s initial study of the Panzani ad (Danesi, 2008; Phillips & McQuarrie, 2004). As a general example of how VR analysis might unfold, consider several recent ads for women’s lifestyle products that are all seemingly based on codes of femininity of various kinds. Consider, first, a Diesel ad (

The ad shows a young female in the background apparently running away from someone or something with her hands outstretched; in the foreground we see the face of another young female with graffiti written all over it. These are key visual signifiers, indicating an interpretive path to pursue towards unraveling the coded meaning. The background female’s arms suggest wonderment, perplexity, and frustration—a set of emotions that is mirrored in her facial expression. She seems to have escaped from some type of fetter, running away from it. Is she liberated and yet asking what liberation is all about? Is women’s liberation nothing more than a set of graffiti truisms, imprinted on the foreground woman’s face? Is society still “writing the rules” of gender? The ambiguity of the code of female liberation to which the images seem to lead is a powerful one. The bimodal aspect of the text inheres in how the separate images can be combined to produce rhetorical structure. There are several tropes at work here that suggest the coded meaning, including paradox (the contradictory images of women’s liberation), oxymoron (the combination of incongruous images), and especially metaphor as embedded in the image of the woman escaping from a fetter (society’s control of womanhood?). Although this is a simplified analysis of the ad, and one among others, the point is that it is possible and plausible in the first place. This is so because it has rhetorical visual structure that have powerful suggestiveness.

The amalgamation of disparate visual images is another basic rhetorical technique in advertising. Blending ideas to produce new meaning is a basic feature of metaphor, whereby two domains of meaning are amalgamated to produce a new system of connotations (Fauconnier & Turner, 2002). In some cases, the whole ad is a metaphorical blend of visual elements, as the FCUK SS09/New Collections ad: the black-and-white photos in the background focus on the lips and neck of female sexuality, suggesting the eroticism of those parts. In the foreground, we see a woman in full color displaying both her lips and her neck, which seem to mesmerize her male companion, allowing her to control him. The code here seems to be that of the femme fatale, with the black-and-white photos arguably taking us back in time to images of early femme fatales who starred in black-and-white photos, posters, and films. Again, there are other interpretive paths that one could take in this case. However, these are still guided by the individual signifiers that coalesce metaphorically (bimodally) to suggest an overall coded interpretation.

Darkness in an ad is often symbolic of the night, when sex and seduction are assumed to take place. As part of its valeur, darkness also connotes evil, mystery, danger, fear, and excitement. It is the paradigmatic opposite of light, which symbolizes the day, and thus (also purportedly) innocence, purity, safety, assurance. This visual opposition appears frequently in lifestyle ads such as the one for Pantene Pro-V (, which is yet another example of the code of emerging feminine power (Danesi, 2009): A woman dressed in black signals sensuousness, sophistication, overwhelming allure. Like the black widow spider, she is not to be trusted either. And because of this, her seductive allure is irresistible and thus fatal. This is, actually, an ancient subtext in the portrayal and depiction of women in all kinds of texts from the Bible to movies such as Fatal Attraction. Barthes referred to this level of textual interpretation as mythological (Barthes, 1957). Barthes’s use of this term was meant to indicate that the ancient mythic themes are recycled in contemporary texts and spectacles in unconscious ways, imbuing them with deeply rooted meaning.

The point to be made here is that ads are read unconsciously in terms of a wide range of coded meanings, much like visual art. Many modern-day artists have, in fact, been hired to create ads and commercials, from Ridley Scott by the Apple Corporation to Salvador Dalí by the Gap and Datsun (Hoffman, 2002).

As another example of bimodality in visual analysis, consider Apple’s 1984 TV commercial, which was shown on January 22, 1984, during the third quarter of Super Bowl XVIII. Obviously evocative of George Orwell’s novel 1984 (published in 1949), and directed by Ridley Scott, whose 1982 movie Blade Runner was already a cult classic at the time, the commercial won countless awards and was characterized by social historians as “the commercial that outplayed the game.” Following is a synopsis of the visual images in the commercial (Danesi, 2008, pp. 81–83).

  • The year 1984 appears at the start.

  • A horde of expressionless men, with shaved heads in prison-like uniforms and boots, are then seen marching mindlessly toward a gigantic TV screen.

  • They all sit as if in an assembly line in a zombie-like state in front of the screen as an Orwellian Big Brother figure shouts meaningless platitudes at them.

  • Then, out of nowhere, a blonde, attractive, athletic woman appears in a white jersey and red shorts running toward the screen.

  • The woman is pursued by a group of storm troopers.

  • She enters the room, hurling a sledgehammer at the television screen, which then explodes.

  • The men remain seated, open mouthed, and dazed.

  • A caption then appears on the screen: “On January 24th, Apple Computers will Introduce Macintosh and you will see why 1984 won’t be like ‘1984.’”

The rhetorical symbolism of the commercial is unmistakable—we are in a new age of womanhood, which can be called the “Eve code” and the Apple Corporation is leading the way into the future. It is the way in which the parts of the commercial cohere into an Orwellian tale of redemption that the interpretive mode crystallizes. Arthur Asa Berger (2000, pp. 126–127) describes the woman in the commercial insightfully as follows:

Who is she? We do not know, but the fact that she exists tells us there must be forces of resistance in this totalitarian society, that not all are enslaved. We see shortly that she is being pursued by a troop of burly policemen who look terribly menacing in their helmets with glass face masks. Her color, her animation, her freedom, even her sexuality serve to make the situation of the inmates even more obvious and pathetic. Her image functions as a polar opposite to the enslaved men, and even though we only see her the first time for a second or two, her existence creates drama and excitement.

The commercial strongly suggests, therefore, that only women can liberate men from the dreary Orwellian world they have created. And the way out of “1984” is the Eve code. Berger (2000, p. 131) puts it eloquently as follows:

The blonde heroine, then, is an Eve figure who brings knowledge of good and evil, and by implication, knowledge of reality, to the inmates. We do not see their transformation after the destruction of the Big Brother figure—indeed, their immediate reaction is awe and stupefaction—but ultimately we cannot help but assume that something will happen and they will be liberated.

In No Caption Needed, Hariman and Laucaitis (2011) use a similar interpretive approach to demonstrate how photographs are powerful rhetorical texts capable of influencing public opinion on all kinds of social matters—recalling Barthes’s own analysis of photography (Barthes, 1977, 1981). Photos may seem to be polysemous (having many meanings), but given the context in which they occur (in a particular part of a newspaper, for example) they are anchored in such a way that viewers will be led to the coded message unconsciously.

Even before the advent of VR and visual semiotics, the pop art movement in the 1950s became deeply interested in the rhetorical force of visual images. The movement brought to the forefront the question of the function of the visual arts in human life. For instance, the meanings of Andy Warhol’s painting of a Campbell’s soup can (1964) were debated (and continue to be debated) ad infinitum. When asked what it means, people will either say that it means nothing, or give responses such as, “It is a symbol of our consumer society”; “It represents the banality and triviality of contemporary life”; and so on. The latter pattern of responses suggests that we tend to interpret visual texts always connotatively, even though the painting of a soup can is, at a denotative level, just that—a soup can. The concept of visual art as something to be appreciated individualistically by viewing it in a gallery or museum hides the fact that art in its origins had a public function. Art works were meant to decorate the public square or to commemorate some meaningful event. Only after the Romantic 19th century did the idea of the art gallery as the appropriate locus for appreciating art emerge as an idée fixe. The pop art movement was, indirectly, an attempt to bring art back to the public square, representing the common objects found there.

It should also be mentioned that work of the Belgian Group µ, founded in 1967, cannot be underestimated in the emergence of both visual semiotics and VR. They related verbal discourse to “visual discourse” showing how visual literacy was as critical to understanding human activities as was verbal literacy. Group µ’s 1970 publication, A General Rhetoric, reformulated classical rhetoric in a new semiotic way, classifying images according to their different semiotic modalities. In Traité du signe visuel (1992), the group elaborated a grammar of the image. Their main point was that images cut across all systems and codes.


In his 2001 book, Media Unlimited: How the Torrent of Images and Sounds Overwhelms Out Lives, Todd Gitlin decries how the modern-day media provide a constant barrage of visual images that wash over audiences but which accumulate in groupthink to blur the line between reality and its imagistic representation. “Images,” he states, “depict or re-present realities but are not themselves realities” (Gitlin, 2001, p. 22). We know the difference, but we prefer the virtual to the real—a point made as well by Jean Baudrillard (1983), who referred to this blurring of the line as the simulacrum effect. This means, essentially, that the power of visual images is such that we can no longer distinguish, or want to distinguish, between the real and the hyperreal (the world created by images). One of the themes within VR today is, logically, the study of this effect, especially as it takes hold on people through digital media. The term simulacrum comes from Latin where it meant “likeness” or “similarity,” and was used in the 19th century by painters to describe drawings that were seen merely to be copies of other paintings, rather than emulations of them. Aware of this designation of the term, Baudrillard insisted that a simulacrum is not the result of a simple copying or imitation, but a form of false consciousness that emerges on its own after long exposure to images through four stages: (1) a basic reflection of reality (the normal state of consciousness); (2) a perversion of reality; (3) a pretense of reality; and (4) the simulacrum, which bears no relation whatsoever to reality.

An example he used to make his point was that of Disney’s Fantasyland and Magic Kingdom, which are copies of other fictional worlds. They are copies of copies and people appear to experience them as more real than real. They are simulacra that reproduce past images to create a new social environment for them. Through simulacra, Baudrillard claimed, people construct their social identities, and this has far-reaching emotional implications. Hyperreal worlds are experienced as more meaningful than real worlds, which are perceived as banal and boring. Eventually, as people engage constantly with the hyperreal everything—from politics to art—becomes governed by simulacra. Only in such a world is it possible for advertising—the supreme manufacturer of simulacra—become so powerful.

The 1999 movie The Matrix understood Baudrillard’s warning perfectly, portraying a world in which life is shaped by the computer screen. Like the main protagonist, Neo, we now experience reality “on” and “through” the computer screen, and our consciousness is largely shaped by that screen, whose technical name is the matrix, as the network of circuits that defines computer technology is called. The same word also meant “womb,” in Latin. The movie’s transparent subtext is that people are now born through two kinds of wombs—the biological and the technological ones.

There is no culture without visual textual traditions and customs. These bear witness to the fact that visual thinking is just as crucial to human understanding, if not more so, than verbal thinking. It is true that we live in a visual culture, where the image is much more powerful emotionally than the spoken word. But this is an oversimplification. Humans have always created and experienced visual forms throughout their history. The difference is one of degree—today visual images seem to be more dominant in social communication than they have ever been in the past. VR has emerged as a critical discipline for examining the power of these images with the tools of semiotics and rhetorical analysis. But a caveat is in order, expressed eloquently by Sonja K. Foss (2005, p. 150): “Not every visual object is visual rhetoric. What turns a visual object into a communicative artifact—a symbol that communicates and can be studied as rhetoric—is the presence of three characteristics. The image must be symbolic, involve human intervention, and be presented to an audience for the purpose of communicating with that audience.”

