Index’s Limit
Matthew Ellis
A famous shot from Jean-Luc Godard’s 2 or 3 Things I Know About Her (1967) depicts the contents of a coffee cup in unbroken closeup, foam from a recently deposited sugar cube swirling round and round in a hallucinatory circle. As the shot goes on, the image blurs into abstraction as if to mimic the swirl of the cosmos itself. A voiceover begins and Godard cuts between the coffee cup and a shot that reveals the source of the voice: a young man quoting Wittgenstein’s Tractatus, ruminating on existence to our protagonist and, by extension, the camera. As the swirling of the coffee slows, the screen blurs into darkness, mimicking a fadeout, and we hear him speak: “Say that the limits of language are the world’s limits, that the limits of my language are my world’s limits, and that when I speak, I limit the world, I finish it . . . Death will come and abolish these limits . . . It will all be a blur.” The image begins to fall into focus. “But if by chance, things come into focus again . . . everything will follow from there.”
The scene is something like a cinephilic icon that screams Paris in the late 1960s, with rumblings of an early class consciousness Godard would continue to explore in his Maoist phase with the Dziga Vertov group over the next few years. Mark Cousins references the shot in the opening of his fifteen-hour film history documentary, The Story of Film (2011), and a scene from Martin Scorsese and Paul Schrader’s Taxi Driver (1976) uses the same effect with an effervescent tablet dumped into a glass of water.
I thought about this shot when I recently saw one of the short videos from OpenAI demonstrating their new Sora diffusion model, a text-to-video generative AI that, on paper, animates text prompts into video format. While Sora is an early model meant largely to demonstrate the capabilities of AI generated filmlike imagery to investors, their pitch seems to suggest this new technology will revolutionize not only the film industry, but all of image production itself. “The dream”—imagine the great, unfilmed movie in your head, type a few key words into an awaiting prompt, and voila—finally realized.
Was this some kind of joke? Who has been kept out of the film industry because their creative visions of tiny pirate ships in a mug of coffee have been thwarted— not only by industrial limitations but the laws of physics themselves (which, we are told, Sora is “learning” to intuit)? I guess one could try to think about what kind of cultural system would imagine a tiny pirate ship adrift in a coffee mug, such that it could provide any kind of pleasure, if not revelation. But soon some Southern California advocates began proselytizing what they saw with visions of new AI films featuring this or that character, across intellectual property or franchise lines, existing in the same video, as if that were why we watch moving image work in the first place. Here, Sora is about to democratize film production, akin to the DAW revolution in audio production a decade earlier.[1] Don’t let the pirate ships distract you, for it is only your imagination that is the limit.
Still, some of the other clips produced by Sora seemed if not more promising aesthetically, then at least aimed towards reasonable applicability, perhaps even in film production. An establishing crane shot of Tokyo in springtime bloom, a California gold rush town, and so on. And it’s not just Sora. The talk of AI video production has left the open office floors of Silicon Valley and rippled through the networks of the film industry. The recent Hollywood industry strikes by the Writers Guild of America and the Screen Actors Guild were staged in part over fears that studios would use AI to generate scripts, or to simulate an actor’s voice for future “virtual” appearances, avoiding the need to pay actors for future performances.
The contradictions embedded in this video—pirate ships in a cup of coffee, simulations of images that would otherwise require multiple layers of Hollywood labor—help us make sense of what is actually happening as Silicon Valley continues its pretensions of transforming age-old social functions into profit. Image production has been an essential human act for millennia, from cave scrawl to Photoshop; now it can be outsourced, as if we had been desperate to do something else with the time it took to shoot a scene. Classification and typification have been formal techniques of the human intellect for centuries, structuring our mythology and allowing scientists to chart speciation; now that work is a function of a computer program. It’s a strange irony, as this desire to bypass human labor through a machine learning model nevertheless only works by inputting thousands and thousands of human-produced images for the model to classify and spit out something that claims to be new. But the resulting image is just a synthesis of what was already there. In a sense, it’s trying to pass off something old as new, it’s Uber reinventing the bus. Perhaps AI is merely the latest symptom of the turn to financialization that Giovanni Arrighi has noted is the bellwether of a cycle of capitalist development shortly before its end.
These are some of the typical ways that AI has been received in popular discourse: as an irruption into longstanding cultural techniques and social practices. Yet, at the same time, the pirates in the coffee cup help illustrate what is truly at stake with this new development. For the most part, reaction to Sora’s supposedly lifelike diffusion model has been, if not outright disgust, then fear—fear of a future that succumbs to what Silicon Valley has done to any other number of predigital industries that can now barely survive in our contemporary ocean of ones and zeroes after the 2010s’ decade of tech’s creative destruction.
Tyler Perry, for instance, announced a pause on his $800 million dollar Atlanta studio expansion after seeing Sora’s output, signaling a cost-benefit analysis that will surely reach other studios in due time. “I just hope that as people are embracing this technology and as companies are moving to reduce costs and save the bottom line, that there’ll be some sort of thought and some sort of compassion for humanity and the people that have worked in this industry and built careers and lives, that there’s some sort of thought for them,” Perry said. If there is any security here, it begs the question: for whom, or what? Surely not the workers in the Georgia film industry, who have seen a blossoming of jobs due to state tax credits that encouraged a marked increase in production over the past few decades. In Godard’s 2 or 3 Things there are surely tens of people we don’t see and who go uncredited in the production; the film never exposes the means of its production, as Godard outlines in Tout Va Bien’s opening credits, for instance. Yet what gives us the shot of the coffee is clearly not anything Sora couldn’t do with a proper prompt: liquid, swirling, not unlike the Milky Way.
The problem is that, more than anything, the difference is initially felt. Some have pointed to the strange digital artifacts resulting from Sora’s hodgepodge process of image production. One video shows us a tabby cat on top of a bed, pawing its human companion in the face as if to demand a meal. Cute! Who among us, as they often say. However, we soon see a third arm manifest out of the cat’s chest and repeat the pawing action its left arm completed seconds earlier. Were you watching a movie, this is where the music would take an ominous, minor-key turn. Suddenly you realize that the human underneath the covers also has an unexplained extra appendage—an inexplicable arm—protruding from the comforter top. Horror!
It is here where security rears its head once again, this time not for the workers but for the image itself. The lure of cinema as a cultural form has long been said to emerge from the fact that it is an illusion that we nevertheless take as real. The spectator of its celluloid form sees motion on the screen, but knows that they are only seeing 24 still images, one after the other each second. We know the person on screen is a celebrity, but for the next two hours, they live in the nineteenth century. The cultural form of narrative cinema works because it requires a form of disavowal, such as the one proffered by the psychoanalyst Octave Mannoni: “I know very well, but all the same.”
“At stake here is not just the security of filmmakers but the security of the spectator, now finally forced to confront the contradiction at the heart of the cinematic illusion only to find it wasn’t supposed to be resolved.”
But the third arm breaks this illusion, not unlike an actor making eye contact with the lens of the camera. What is interesting about this error is not so much that it transforms what we take as “reality” through impossible digital physics, but rather that it calls attention to what we know about how the image was produced: by a text-to-video model that simultaneously announces its novelty while trying to pass itself off as an image we have encountered before. It is a variant of the so-called “Uncanny valley” that critics have suggested results from digital representations of human figures in films like Final Fantasy: The Spirits Within (2001). But instead of derealization emerging from an asymptotic increase in the iconicity of the human form, these strange “failures” in Sora’s videos seem to serve as a kind of alibi for the skeptics among us who want to defend traditional, indexical cinematic image production. See? They suggest. It can’t get it right, as if getting it “right” is the metric by which we judge the use of images in our postmodern, digitally mediated society (anyone who has watched a moment of cable news could easily see this as the fool’s errand it is).
But this alibi merely functions to convince us that what we all know is coming will not, or that AI video production won’t transform the industry due to aesthetic flaws in its output. I tend to agree with Rob Lucas, who recently warned us that our position towards AI should be more dread than dismissal. Matteo Pasquinelli has described the dominant understanding of the purpose of artificial intelligence as a “quest to solve intelligence.” But while its threat to replace your middle school’s biology teacher, or newly-unionized graduate student English composition instructor, illustrates the danger of this logic, one might suggest its application in Hollywood as a hammer that finally found a nail. At stake here is not just the security of filmmakers but the security of the spectator, now finally forced to confront the contradiction at the heart of the cinematic illusion only to find it wasn’t supposed to be resolved.
At the same time, I think we are lying to ourselves if we pretend that the discomfort we feel from these demonstrations is purely the result of witnessing the hallucinatory failure of the depicted events to map onto physical reality. Each error we draw attention to, then, is akin to an alibi, a hope that it won’t come to what we all fear it will. Joan Copjec has described a similar process at work in the cultural reception of a historical figure upon whose death salacious rumors began to swirl—a process which calls into question what had been known about the figure in the first place. Whether we are trying to mine private correspondences to “discover” the Truth of an individual, or counting extra limbs sprouting from an otherwise photorealistic moving image, we seem to be possessed with a psychological fantasy that the accumulation of enough evidence can serve to reconcile that discomfort and assuage us that we can know what we assumed, that what is wrong with AI has anything to do with the images themselves.
“Confronted with the possibility of any fact’s being able to provide proof not only of a specific psychological intention but also of its contrary, unable to extract from any single or mass of facts a guarantee about our suspicions concerning the person these facts surround, the psychological construction supposes a subject behind the facts who has unique access to his or her own psychological intentions, who uniquely knows by virtue of being the living experience of those intentions. The psychological fantasy constructs an inscrutable subject, a kind of obstacle to all archival work, a question that historical research will never be able to answer.”
While I’m sure the folks over at OpenAI would love for me to anthropomorphize their text-to-video model, the extension of this metaphor from Copjec’s historical figure to a digitally constructed image is no error. We describe “its” failure, we critique its images as if to send a message to its creator, who in this case really only wants to sell their model to investors ready to find the Next Big Thing.
The truth is more disconcerting: these images are going to continue to proliferate whether we like it or not. The truth is, as Copjec notes, that there is “nothing (here) to fathom,” that these mistakes might simply be the latest in a long history of new media made more convincing because of its flaws. When JFK was shot in 1963, the live television broadcast cut to a reporter in the studio, desperately trying to get in contact with someone on the ground in Dallas via a telephone connection that kept breaking up. The failure to connect suggested to viewers that what was happening on their screens was real, the urgency of the moment illustrating the limitations of the available technology to mediate it to awaiting spectators. The mistakes in today’s AI generated clips then show us how seamless the future corrected images will be, or at least what OpenAI wants us to think they are— the final dream of immediacy shorn from mediation.
A cat who can generate a new paw from its neck, an arm sprouting out from a blanket—what is upsetting about these images is precisely that they do look real despite these obvious flaws, which we all know will be corrected in a coming update. What these images auger for us is akin to the same fear we feel as we watch capital spread into every other realm of our social lives, transforming what heretofore felt solid into air and telling us that we haven’t seen anything yet. This time, however, it seems the creative destruction wrought by generative AI in the production of images is anything but creative—down to the nature by which these images are generated (by sampling existing photographs), or the sense that they are destroying the jobs of the visual artists that produced the images the algorithm uses to generate these strange videos in the first place. Our current period of capitalism no longer seems to even offer us the possibility of creation. It’s just destruction. Death will come and abolish these limits. It will all be a blur.
Ψ
André Bazin once suggested that cinema directors can be categorized into two groups: those who place their faith in what he calls “the image,” and those who, indifferent to the parlor tricks of new technological developments, are attuned towards “reality.” This is a famous quote that anyone who teaches film history has had to grapple with, often to confused students who don’t see all that much difference between their phones or the celluloid image projected onto the classroom screen, converted to digital pixels and running through a laptop streaming video on a web browser. Bazin’s distinction between “the image” and “reality” is often lost on those of us who grew up in the world of digital access, but you can imagine how revolutionary that feeling must have been when the long takes of Orson Welles or Jean Renoir were in living memory.
On the one hand, there were directors taken with the power of editing who effectively told the viewer what they should think or feel in a scene through the shocks of dialectical montage between two juxtaposed images. On the other, were those who let you see what they saw, providing access to an unfolding in front of the camera. Bazin’s insight was that something was beginning to shift in image production during the 1930s and 1940s across Europe and America: directors were beginning to take seriously what is called the “profilmic” field (essentially, anything actually in front of the camera as it records what it does). For Bazin, this event in the history of image production represented a fundamental break from the tradition of Western art, which he thought had been so obsessed with the replication of the appearance of reality that it forgot its ability to transcend—or at least grasp beyond—material reality itself.
“It is precisely our desire to see what we want to see that has always been at the core of how we make sense of the images we see and how we live with them.”
Per Bazin, this profilmic inclination in popular cinema became not merely an aesthetic but an ethic. Directors such as Roberto Rossellini in postwar Italy began using raw materials in front of the camera not merely to present to their viewers what “it looked like,” but also to depict the unfolding history they were themselves living through. Rubble in Rome, freshly bombed as the Nazi occupation was ending—a setting for a filmed play. Nazi uniforms pulled from the detritus of what was left by fleeing soldiers—surely able to be worn by a nonprofessional actor who just so happened to speak German and had time to put on a performance one afternoon. Writing in France a few short years after the war, Bazin’s theories about filmic realism became arguably the most influential documents of what has come to be called “classical film theory,” concerned with the ontological nature of the filmic medium and its relation to existing, “respectable” bourgeois art forms like theater, painting, or sculpture. The profilmic event effectively had an afterlife projected onto each cinema screen, as Bazin saw it, almost living again on each frame of the film strip as the light bounced off the actor in front of the camera to produce her like in the apparatus. A mummy complex, he called it.
The profilmic field has a powerful relationship not only to film history but to our broader culture of image production. Eadweard Muybridge’s motion studies experiments in the late nineteenth century are to this day often described as foundational scientific inquiries into the nature of reality using technical optical tools that surpass the power of the human eye alone. Just over a decade before Bazin published his writings in France, Walter Benjamin identified what he referred to as the effectively objective nature of the camera’s ability to mechanically reproduce reality free from the artist’s interference. But this was no machine that sought to reduce human labor (to great environmental cost!) in the name of technocratic efficiency. “The camera introduces us to unconscious optics as does psychoanalysis to unconscious impulses,” he wrote in his famous “Work of Art in the Age of Mechanical Reproduction” essay. Disavowal, the optical unconscious: these were not merely modes of interpreting cinema that emerged alongside its rise as the dominant mass media form of industrial modernity. Throughout the twentieth century, these contradictions—cinema’s ability to represent reality in an indexical image, or to conjure fantasies from and for the imagination— became oft-repeated to the point of parody. During the 1970s, film theorists in France and the Anglophone world began turning away from Bazin’s ethic of realism, fearing in their post-1968 hangover that mainstream cinema’s obsession with likeness was part of a broader cultural logic designed to reify a kind of protocapitalist realism into the shared field of visuality.
But by the mid-90s, as animators at George Lucas’s Industrial Light and Magic began integrating digital image production with celluloid film capture, something began to shift, suggesting Ben Stiller was quite off base when he tried to tell a generation that Reality Bites. And yet what brings me to the present conjuncture is something of a disconnect between this genealogy of filmic realism and the institutions of image production we find ourselves enmeshed in today. Effectively, the third arm of the cat in Sora’s AI video serves a dual function. It at once assures skeptical viewers that such image production could never replicate Real Cinema due to its surreal physical impossibility, while also presenting itself as a convenient problem in the investor pitch that these videos are so clearly designed to be: they are so real that they are creating more reality.
The threat that AI generated videos present to film culture—to stable employment by film artists, animators, cinematographers, and crews, or even to the realm of the aesthetic itself—cannot be waged on the terms of the resulting images’ verisimilitude or lack thereof. Bazin’s ethic of realism may have been the historical byproduct of witnessing a generation of filmmakers responding to the shifting postwar conditions they found themselves in, with the celluloid technology they had available to them at the time. But as Philip Rosen has argued, Bazin’s insight need not be one that reifies realism in the image (or lack thereof) as such. The indexical claim of a strip of celluloid film recorded during the 1940s or the iconicity of an AI image of pirate ships in a cup of coffee could offer themselves as solutions to what is effectively a desire on behalf of the spectator, regardless of the ontological conditions of the image’s construction.
Bazin has described the lure of the photographic image as a kind of security over death itself, a “preservation of life by a representation of life.” But missing from this quote—or any other from Bazin’s writing, for that matter—is the word index itself. Rosen posits that we can think of Bazin’s understanding of signification through what he calls the indexical trace, a sign that doesn’t simply require the referent in the creation of the sign itself, but a sign in which the referent was once present in the past. Arguably, the generative nature of AI fits this category implicitly: each image you see, from the pirate ships to the cat’s paw, is pulled from existing imagery, either photographic or otherwise.
And yet it is precisely the spectator’s desire for belief in the image that activates this trace, as Rosen goes on to argue:
“Bazin must assume that the special credibility of photographic and cinematic images is based on the subject’s prior knowledge of how any such images are produced. Furthermore, that production is apprehended as coming from some past moment, which makes temporality a crucial component of the process for the subject. Bazin links both of these assumptions to the idea of a subject obsessively predisposed to invest belief in such an image.”
The trap of “is it real?” is precisely the trap generative AI sets out for skeptical viewers—and those who are in charge of its programming want nothing more than to prove that it can be. One tactic might be to firmly reject the use of AI, to organize against creative destruction before every aspect of human sociality is handed over to Silicon Valley. But what Rosen and Bazin suggest is that how we look is just as important. It is precisely our desire to see what we want to see that has always been at the core of how we make sense of the images we see and how we live with them.
For all the talk of digital novelty and text-to-video models that conjure their own “profilmic” fields, what comes out of these models looks like cinema because that’s what its creators want to see, what spectators want to see, and, ostensibly, what potential investors and studios want to see. Fighting a battle over the truth claim of the image is not only a bad tactic to stave off AI’s creative destruction in the industry, it is also a betrayal of the lure behind the contradiction that lies at the heart of cinema itself. If we truly are spectating subjects “obsessively predisposed to invest belief” in the image, then maybe cinema needs to give its spectators something to believe in.
Ψ
We might implicitly know that the coffee Godard turns into a galaxy-like swirl in 2 or 3 Things was actually inside a cup inside a Paris café inside the country of France inside of Europe during the late 1960s. But our knowledge of the existence of the profilmic coffee and the spiraling sugar effectively fade into the background once he begins to pull the camera out of focus, giving the image a hazy, black blur that feels like a scene fading to black. It isn’t until Godard refocuses his camera on the bubbles that we are suddenly reminded that what we were looking at was more than the component parts that went into the production of the image, that some kind of representational limit was reached—if not surpassed—by what we had just witnessed on screen. In this sense, the narrator’s melodramatic, existential ruminations might suggest a different way of engaging with the digital image that doesn’t rely on its claim to truth, its iconicity or verisimilitude, or even its narrative function within the film itself.
Say that the limits of language are the world’s limit, and the limits of my language are my world’s limits, and that when I speak, I limit the world, I finish it.
But if by chance, things come into focus again, everything will follow from there.
[1]“Digital Audio Workstation.” Yet, unlike Sora’s generative image production, audio DAWs still require proficiency with the instruments to be recorded and mixed within the DAW. Sora, instead, seems to proffer full automation—from the conception of the image before its recording to its culmination via mediation.