Emergence
New work in new media by Deniz Tortum
Replicating Images: Submerged, AI and Immersive Cinema
In October, Apple released its first scripted immersive film production, Submerged, on Apple Vision Pro (AVP), its mixed-reality headset. The film follows a submarine crew during World War II who fight for survival after enduring a torpedo attack. The high-end technological production included a submarine set, a proprietary 3D camera by Apple and headsets for the crew to monitor the shots in real time. The production design is impressive, the use of close-ups peculiar and novel—it’s a well-crafted film. However, more than the film itself, the experience of watching Submerged on the headset is what truly shines. A making-of trailer highlights the qualities of the AVP headset. The film’s Oscar-winning director Edward Berger says, “It’s much more real than anything else.” Production designer Nicholas Palmer emphasizes the level of detail you see in the headset: “When you’re watching the film on Vision Pro, you see everything in focus. Every detail is apparent.”
Yes, these attributes are made possible by the quality of the headset, but in these soundbites the team behind the project also implicates the fundamental affordances of the VR medium. There are two main qualities of immersion in VR: the surrounding image space (like the 360-degree Panorama, the widescreen of Cinerama, the recent Dome venues) and the three-dimensional depth perception of stereoscopy (stereo photos, 3D movies). These two aspects, however, were never able to be combined until the VR headset. By creating imagery in real-time using a computational process—for a 360° video, the full 360° video file is reframed 60 times every second based on the head movement data of the user—the VR headset is the first technology to combine the surrounding/panoramic image space with stereoscopic 3D depth perception. (This is an observation Jens Schröter makes in his book 3D: History, Theory, and Aesthetics of the Transplane Image). In no other exhibition setup can you look around while keeping spatial depth perception intact, which is what makes the affordances of the headset unique: the increased resolution of the headset, the increased frame rate of 90fps (persistence of vision in cinema is 24fps; in VR, persistence of simulation, for lack of a better word, is closer to 120fps) and its own proprietary immersive video format.
These new developments in hardware realize a dream that is almost 250 years old. When Robert Parker patented the Panorama in 1787, he wrote that the goal of his invention was “to make observers… feel as if really on the very spot.” When Oculus Rift went on the market in 2016, it was marketed with the tagline: “You’ll feel like you’re really there!” Not only do we seem to be repeating the dreams of the past, we’re also defining the medium of VR with objectives and language that is three centuries old. This may be hindering new possibilities inherent in the medium.
While the hardware is on a mission to realize a centuries-long dream, AI tools developing alongside this work are providing more and more content for the platform, acting as a new kind of transmedia converter. I directed an immersive film, Floodplain, in 2018, in which a search party looking for someone lost in a forest becomes lost themselves. Initially, we planned to shoot it on one of the few professional 360° cameras available at the time, but test shoots revealed that the camera couldn’t capture the forest’s details—leaves in motion and other arboreal details became a blur. To achieve a sharper 6K image, we built a custom rig. After weeks of managing data, stitching footage and trying to get an export, we discovered the available headsets could only handle 4K playback. We had to compress the film into a 3GB 4K H265 file for the Samsung Gear VR, used by festivals in 2018. The leaves became a blur again.
With AVP offering playback up to 12K, I’m sure that a remastered Floodplain—a 6K export—would look better. However, today’s AI tools could further enhance the film. Floodplain was shot 6K, mono and 30fps. We can use a software tool like Topaz AI to boost the resolution to 12K, generate sharpness and details and get rid of all the filmic artifacts, such as the grain (celebrated in the film world, looked down on in the VR world). We can also increase the frame rate to 90fps or 120fps to reduce simulation sickness. Or, using a recently introduced tool like Immersity AI, we can take 2D images and convert them into 3D images: Floodplain was shot mono, and this tool can, clumsily, turn it into a stereo 360 mix by analyzing the image and creating a depth map, then projecting the image onto that depth map to create a stereo image. These tools would make the film look both more realistic and more immersive.
What is more interesting here, though, is the idea that every image carries its 3D spatial data within it. In a recent paper, “ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning,” Google and National University of Singapore researchers introduced a new method called ReCapture. With this method, one can add camera motion to a shot that is originally locked down on a tripod. The static shot becomes the source to regenerate a moving shot. In the context of these AI tools, images now carry additional layers of information that can be the instructions for new images. An image is not an end in itself anymore; it is data waiting to be transformed into something else. This framing applied to viewership implies an interesting shift as well: Watching a film can become more similar to reading a legislative text or a code—imagining all the things that it can enact in the world, seeing a film as a framework or a grammar for possibilities.
Watching the end credits of Submerged, I noticed that the copyright belongs to Apple Sentity LLC, an Apple subsidiary based in Culver City, California, whose business purpose is stated as “video production.” The name caught my attention. How did they come up with it? What does it mean? My best guess is that it’s a fusion of “sentient” and “entity”—a nod to the merging of consciousness and technology. Regardless, the choice of “Sentity” as the company’s name hints at how the cultural logic of AI is seeping into the ways in which Apple is thinking about filmmaking.
A near-future I imagine with the development of these AI tools is that every film and video can be transformed into immersive content. Let’s take a film shot in the 1920s—silent, black and white, 4:3 ratio. This film could be made to have color, sound, a 180-degree frame and stereo depth perception. Even though it was shot to be screened at a theater, it can be adapted to be seen within the headset natively. André Bazin argues in “The Myth of Total Cinema” that the founding myth of the cinematic medium, voiced again and again by its inventors, was the possibility to create a complete illusion of life. Its technological development also followed this trajectory, from the invention of sound film to color—that cinema strived to be reality itself. Not only could AI in combination with VR further realize this early dream, but it may also realize it retrospectively.