Did You See (and Hear) That? Why Home Viewing Can Be Dark and Inaudible
If the most frequently used pejoratives for contemporary films—“dim,” “muddy,” “inaudible”—are to be believed, we’ve entered a literal dark age of cinema, with cinematographic and sonic tools pushing filmmakers to ever-greater depths of audiovisual obscurity. For more than a decade, Christopher Nolan has incurred the wrath of audiences who prefer their dialogue clearly audible.1 Five years ago, the Game of Thrones episode “The Long Night” propelled the topic of the visual “New Darkness” into mainstream discourse. That this seemingly unsustainable state of affairs has persisted for as long as it has is, to many outside observers, perplexing. Why haven’t we in the industry solved it? Who’s to blame?
Much of the discourse attempting to pinpoint supposed “causes” of these trends tends to focus on forces perceived as acting upon filmmakers. These attempts curiously discount the agency of filmmakers themselves, as well as the perhaps inconvenient idea that cinematographers, sound designers and directors might find these newly achievable aesthetics exciting in and of themselves. There is, however, a blind spot shared by many audiences, critics and some of us within the film industry: a relative lack of attention paid to the structural faults in the technical pipelines that deliver images and sounds to average viewers. We could all stand to profit from studying these mechanisms and their failures.
Contemporary trends in audiovisual design push the envelope of what current home-viewing technologies can reproduce, doing favors to neither. In stark contrast to the technologically advanced viewing environments professional filmmakers operate in, most real-world viewing conditions run counter to optimal image and sound clarity. Rarely calibrated for accurate image reproduction, consumer displays in bright viewing spaces are in direct conflict with the low-contrast, low-white-point images frequently produced by contemporary cinematographers, while consumer sound equipment—most often designed for footprint minimization and dynamically adjusted distortions to the audio so as to sound “punchy”—is in conflict with accurate and clear reproduction of the subtleties of high dynamic range contemporary sound mixes.
Marketing expediency frequently operates at cross purposes with aural and visual fidelity. “You can market a tool,” says visual effects artist Todd Vaziri (Avatar, Rogue One: A Star Wars Story). “Tools sell. They’re quantifiable.” This applies to every facet of exhibition: Numerical gains in resolution are prioritized above high-quality data, an auditory perception of “punchiness” is prioritized above accuracy in sound reproduction and the convenience of streaming trumps adequate data rates, leading to garbled images. Technological illiteracy on the part of filmmakers and audiences alike is closely related, and central to this issue is the inordinate complexity inherent to current technology, largely brought on by deliberately confusing branding and gimmicky marketing pushes.
The convergence of the previously relatively separate worlds of content intended for viewing in theaters and that intended for home viewing is likewise a contributor. The advent of high-definition video formats such as Blu-ray and HDDVD in the mid-2000s meant that home video could meaningfully approximate the theatrical experience visually, and this gap has only closed since then. This evolution has further emboldened filmmakers to push aesthetics previously only meaningfully viewable in a theatrical context into content intended for home viewing. “There are no single bad guys in this little narrative,” says Academy Award–winning sound designer Mark Mangini (Dune, Mad Max: Fury Road, 32 Sounds). The answers, such as they are, can only be arrived at by disentangling several elements.
Aural Mud
The charge is simple and difficult to refute: Audiences can’t hear the actors speak. This criticism is most often (though not exclusively) directed at contemporary narrative fiction feature films, a distinction key to understanding some of the mechanics of the issue. In contrast, “television has a standard that we effectively call Loudness K-weighted Full Scale (LKFS),” says Mangini. LKFS specifies not only a peak volume (how loud a film’s soundtrack gets at its loudest) but an average volume level. Netflix’s standards, for example, specify an LKFS of -27, with wiggle room of two decibels in either direction. Theatrical feature films are subject to far less scrutiny. “None of those metrics are applied,” says Mangini. “It is sort of a free-for-all.” Dialogue is set at what is deemed a natural speaking level, and the soundscape is designed around that reference point. “Anything can be anything!”
These mixes tend to be optimized for cinemas, whose characteristics are distinct from the vast majority of home viewing environments. “An audience is willing to accept a certain level of bombast in the presentation of audio in a cinema as opposed to at home,” says Mangini. Cinemas are quieter; the “noise floor,” or the average level of ambient sounds around a viewer, is far higher at home2, with dishwashers, lawnmowers, neighbors, family members and pets all contributing. In a cinema, a relatively soft detail might be easily heard; at home, such details (or dialogue) tend to get lost due to the encroachment of ambient noises.
These characteristics mean that films aimed at theatrical audiences tend toward a wide dynamic range in audio levels: Dialogue might be leveled at a regular speaking volume, but an explosion or brass blast might be orders of magnitude louder. Contributing to this phenomenon is the expanded latitude of digital soundtracks. Before the advent of digital sound, optical tracks featured relatively limited dynamic range. Since the transition toward digital soundtracks in the 1990s, the possibilities for dynamic range in soundtracks have led to something of an arms race. As Mangini describes, “The score’s just doing this incredible thing. But wait a minute, I can’t hear what they’re saying! So, you raise the dialogue up. Now, somebody says they can’t hear the guns anymore. The sound effects come up. Now, the music has to come up. You end up trying to top yourself.”
Bombastic sound mixes present a significant issue when translated to home viewing, so nearly all theatrical mixes are subject to remixes for home video. “It is an accepted axiom that most home theater consumers do not like wide dynamic range,” says Mangini. “Most complaints [that are] ‘I can’t hear the dialogue,’ they’re in that situation where you’re lying in bed with your loved one, and it’s 11 at night. In that environment, you don’t want bombast. You just want the story presented to you.” So, the dynamic range needs to be compressed. Ideally, this means a full remix. Some engineers build exotic cocktails of compressors and limiters to control dynamic range, while others do it all by hand, manually reducing loud audio peaks or extended loud sections of music and sound effects.
But in many instances, this process is entirely automated. The differences can be drastic: Home theater mixes are often mastered at 78db rather than the theatrical standard of 85db. In effect, this is less than half as loud as the theatrical standard, lowering the sound’s “ceiling” by seven decibels—in extreme cases, a gunshot is no louder than raised voices. Some streaming services now automatically compress mixes in an attempt to further reduce this dynamic range. “It’s not a tailored minute-by-minute dynamic compression; it’s a one-size-fits-all compression that eliminates all the ways the mixers carve out space for voices,” Ursinus College media studies professor Eric Dienstfrey says. “You’re actually making the dialogue harder to hear by mucking with it so much.”
These complications extend to the multichannel layouts common to cinematic exhibition. Theatrical films are almost always mixed for and exhibited in surround sound formats. A “hero” format of high fidelity is chosen, generally Dolby Atmos, 7.1 or 5.1, then mixed in highly controlled sound environments where clean reproduction is assured—the same kind of environment that will give an audience the best experience of that mix. Studios and streamers, however, require various submixes. Mixed in Atmos? Well, you still need to deliver a 7.1, 5.1 and stereo mix. “This is where problems begin,” says Mangini. “The vast majority of streaming audiences view content via stereo, meaning only two channels of audio are being reproduced. If that’s the way most of the world is consuming it, you’d think that would be the mix we spend the most time on.” The opposite is true. The majority of the time, stereo mixes are automatically downmixed from the hero mix, never monitored and never quality controlled—Frankenstein mixes that nobody’s heard but everybody has to listen to. “There’s an art to a stereo mix,” Dienstfrey says. “Stuffing seven channels into two, that’s going to cause problems for the dialogue.”
While high-end home cinema audio equipment has evolved over the course of the past three decades, built-in television audio equipment has devolved. “You are almost compelled to buy a sound bar if you want anything resembling decent sound, which is to say a reasonable amount of dynamic range and frequency response, free of acoustic anomalies, because most manufacturers aren’t producing television sets with two speakers on them anymore,” Mangini says. Sound bars, which have dominated the consumer space for audio monitoring solutions in recent years, are themselves a compromise. They are significantly more convenient than traditional bookshelf speakers, and much smaller, but there are tradeoffs: The size of a speaker’s driver is directly related to its ability to accurately reproduce a wide range of frequencies, so most bookshelf speakers have a driver at least five inches in diameter. “The biggest speakers in the biggest sound bars are around half that size,” Mangini says.
As a result, consumer- and budget-oriented sound bars often rely on a set of processes to enhance perceived sound quality. “The software is making decisions, and they’re not necessarily the right decisions,” says Dienstfrey. “It’s going to sound so much more impressive than your piddly TV, but to do this it’s playing with audio levels in a way that might actually be hurting the dialogue.” He relates a story about some friends who attempted to watch Mad Max: Fury Road on their new “surround” sound bar: “There were moments of disappointment when we were clearly hearing things incorrectly, like the panning of audio. It was supposed to be smart, but you were hearing strong jumps in volume.” Instead of presenting an accurately reproduced signal, the basis of what engineers consider “sound quality,” the sound bar was effectively attempting a live audio remix to impart a veneer of apparent quality.
The marketing tactics for these devices have led to widespread confusion as to their capabilities. “I personally equate [Dolby] Atmos with immersive audio from hardwired speakers,” says Mangini, discussing a mixing protocol that allows for sounds to be placed and then reproduced as objects in a three-dimensional space. Atmos requires a wide array of speakers to adequately reproduce an immersive soundscape, typically upward of nine. One might, however, see an “Atmos” button on their phone that has the apparent effect of merely adding a bit of spatial reverb to one’s headphones, or see a small sound bar that carries the “Atmos” branding. Dienstfrey thinks this sows confusion, opining that “some team made the decision that they were going to call everything Atmos, even things that are very different from how Atmos initially was conceived. I think it’s actually killing the brand.” Mangini concurs, saying that the “Atmos” branding might lead consumers to think they’re getting a high fidelity experience, but “Atmos on a phone is a marketing gimmick. It’s doing a variety of phase-related gimmickry to create an acoustic sensation of sound that’s somewhere off of the phone.”
Visual Mud
There’s a kind of rhyme between the widely diagnosed issues of aural legibility in contemporary cinema and those of visual legibility, both in their perceived and actual causes and effects. Contemporary visuals are commonly diagnosed as “dark,” “underexposed” or “underlit.” In actuality, they describe an array of phenomena, many of them widely misunderstood.
The most common charge, “dim,” is often used interchangeably with “underlit.” Tools are frequently blamed; “the digital look” is as much an accusation of modern equipment as an assessment of its apparent effects. What’s usually being described, in technical terms, are films with a low average or peak Institute of Radio Engineers (IRE) level, a unit of measurement that describes the brightness of a video signal: 100 IRE is white; 0 IRE is black. A “traditionally” handled digital image will usually render light-skinned people at somewhere around 60 IRE, going down to around 30 IRE for people with darker skin tones. The vast majority of this 0 to 100 IRE tonal range is generally used. It has become common, however, for the brightest parts of a film’s image to scarcely exceed 60 to 70 IRE, with peaks sometimes going even lower. This leads to what might be called “dim” images.
Speculation as to the cause is rampant: to hide VFX elements? Laziness? Studio meddling? Streamer mandates? “Why was there a period of time in fashion where men’s polo collars were flipped up?” jokes Vaziri. Fashion, enabled by tools, plays a primary role here. Though often attributed to digital cameras, the specific tools allowing cinematographers and directors to pursue this look exist predominantly in post-production. Photochemical workflows didn’t allow for images to be darkened or their peak brightness to be “clamped” in post-production in the same way permitted by modern digital tools, which let a colorist take a bright image and, as long as the shot’s exposure retains data in the highlights, freely darken it. Directors, cinematographers and colorists can now create images that exist in the extremities of any given color space without risking disaster—at least, as long as they assume that the audience’s viewing environment is amenable.
Yet, these tools aren’t alone in driving these trends. The interplay between technological innovation and filmmakers is complex. While certain large-scale and sudden historical movements in A/V technology, such as the transition to synchronized sound in the late 1920s and various widescreen formats in the early 1950s, were very much mandated by studios, the ways in which new toolkits manifest themselves onscreen is often driven by the volition of artists. Cinematographer Oren Soffer (The Creator) dismisses the idea that streaming services such as Netflix are imposing the low-IRE digital color grading look on filmmakers: “What we often perceive as trends are really just a case of cognitive pattern-seeking bias; if you look across Netflix’s entire library, you’ll find that a vast majority do not have the ‘murky’ look…. I do think it comes down to a matter of filmmakers’ taste. For whatever reasons, they have to prefer that look.” Vaziri similarly dismisses the idea that these decisions are made to mask visual effects3 : “Desaturated, murky VFX are not cheaper. The price the studio pays is going to be the same because the job of the VFX company is to deliver a rich image with which the colorist can do anything they want.”
Whatever the case, these trends exist. Critics tend to focus on the “darkness” of the images themselves, but more often than not the issues lie elsewhere, beginning with how these images are shown to audiences at home. The issues start with display calibration. Out of the box, most consumer televisions, phones, tablets and monitors are calibrated to “pop”—to show off just how bright and vivid the display can get. Intended to wow consumers, this calibration is generally at odds with accurate image reproduction. “I find that most people have their settings too bright, not too dark,” says Duplitech colorist Blake Blasingame. “If you look at an OLED in an average person’s home, it’s pretty likely that they’ve got that backlight all the way up.” Viewing environment plays a complementary role: Most televisions are situated in brightly lit rooms, and therefore need to compete against natural and artificial light. All content will look dim in this context, but content that has been specifically crafted to look dark in ideal viewing conditions will disproportionately suffer, often rendering it illegible. So, up goes the brightness on the panel. “Filmmaker Mode,” an industry initiative launched in 2019 aimed at TV manufacturers and viewers alike with the goal of encouraging more accurate calibration in consumer panels, is intended for dark viewing environments; it’s no match for a sun-drenched living room. “Once the color grade leaves this facility, it’s a wild, wild west out there,” says Blasingame. Adjusting one’s display to account for a bright viewing environment might solve problems of basic legibility, but doing so further distorts color accuracy. Shadow tones are artificially lifted, leading to washed-out dark areas, banding artifacts caused by a lack of tonal data in darker areas and the exacerbation of previously hidden compression artifacts. Highlights are pushed so as to exceed the display’s potential color spectrum, resulting in the “clipping” of highlight details, rendering the brightest parts of the image textureless areas scrubbed of detail.
The manner in which films have been delivered and the form they take when displayed have compounded these issues. Prior to the dominance of streaming, physical media was the central delivery medium for home viewing. Blu-ray discs, by current standards, featured extremely high data rates: With a resolution of 1920×1080 (two million pixels) and an average bitrate of 25,000 kilobits per second (kbps), each pixel that comprises a frame of any given disc would be allotted 0.5 bits. This number is known as bits per pixel (BPP). Throughout the 2010s, consumer displays and delivery media, including Blu-ray, gradually transitioned to 4K resolution formats that featured four times as many pixels. Data rates, however, have not kept up even on premium formats such as “Ultra HD” Blu-rays, which typically have a data rate closer to 50,000–60,000 kbps. That results in a BPP of 0.25–0.3—just about half.
Streaming is in another realm entirely. Netflix’s maximum 4K streaming video bitrate is approximately 20,000 kbps4 —a BPP of 0.1, or around 20 percent of an average Blu-ray. In real-world usage, the effective bitrate is usually significantly lower. Huge swaths of the data comprising the image are removed in one way or another for the sake of a smaller file. “If everybody was on a fiber-optic internet connection, then there wouldn’t be a problem.” says Blasingame. “But it’s not cost effective. So, they’re giving you the bare minimum of what passes with the average person.” This aggressive compression wreaks havoc on dark areas, which the algorithms generally target as less visible. When a scene, or a film, exists entirely in those dark IRE levels between 0 and 25, phenomena like the widespread banding and artifacting noticed by many on the broadcast of “The Long Night” occur.
Further complicating image delivery is the significant shift in color encoding amidst the adoption of 4K. Rec. 709, a term often used interchangeably with “standard dynamic range” (it’s one of many SDR standards), was the near-universal standard for color encoding on consumer displays throughout the early 21st century. This changed in the mid 2010s with the advent of HDR as a consumer product and the industrial push toward display encodings allowing for a far wider gamut of displayable colors and luminance levels. Today, we’re still in that transitory phase, and immense pressure to adopt the format has been placed upon filmmakers and viewers alike. Filmmakers must now deliver multiple color grades for multiple exhibition formats—SDR, DCI-P3, HDR10+—alongside proprietary HDR formats such as Dolby Vision. These formats often require entirely new color grades with drastic changes, and projects that are cash strapped or operating under apathetic management sometimes automate it, to predictably shoddy results.5
Viewers are placed in an even more unenviable position: They must not only know how to calibrate their television, but also how to shepherd multiple distinct image encodings through an image pipeline. Is your disc SDR, HDR10+ or Dolby Vision? Is your television recognizing that or even capable of doing so? Is your tone mapping algorithm working correctly? Have you separately calibrated your television in the (often very different) menus of its SDR and HDR modes? Vaziri sums it up: “I’m not criticizing people for not knowing how to tune their TVs. They shouldn’t have to know how to tune their TVs.” We can’t blame viewers for their poorly managed color when the barrier for technical understanding is so high.
Filmmakers are by no means exempt from this confusion, and misunderstandings about color and display management can have a profound impact on the way movies look—search no further than the “low contrast” trend of the early 2010s, driven in part because filmmakers who embraced their cameras’ then-newly popularized logarithmic color encoding didn’t understand the purpose of what they were using. The specifics of what HDR encoding entails, its intended and best uses and its aesthetic impact remain poorly understood among much of the filmmaking community. This lack of understanding causes as many problems as the format could ever hope to solve.
How are filmmakers, audiences and technology manufacturers to address these problems? The solutions can’t be reduced to one or two fixes, and even so, yelling “stop” in the face of artistic currents has never proved particularly efficacious. It is, however, worth separating out blanket condemnations
of a trend—“stop with the dim and muddy cinema!”—from a healthy skepticism of trends, which, by their very nature, exert a gravitational pull on artists. When a certain aesthetic tic becomes fashionable, artists who might not have otherwise embraced it find themselves looking toward it as a default state. Dim visuals and buried dialogue are just like any other aesthetic mode: apt for certain projects and processes, not necessarily a good fit for others. It’s not any particular artistic gesture or current, but the very nature of herd mentality, that can cause aesthetic issues when they arise.
“There’s an interesting schism between filmmakers who see themselves primarily as storytellers and those who don’t,” says Mangini. He places himself in the latter camp but has a word of advice for those like him: “Most mainstream cinema is about story. The cinematographers who struggle are the ones who forget the fact that light needs to be emotionally right for the moment: We need to see the actor’s face if we want to communicate. And that’s something that’s lost. The same applies to sound: We’re a verbal species!” His message is simple: Know why you’re making aesthetic decisions, regardless of what’s fashionable.
As for that yawning gap between artistic intent and exhibition, there exist clear steps that might help us bridge it. Soffer urges filmmakers to take into account less than ideal viewing conditions, at least to a degree: “Simply dismissing all potential [compromised] viewing conditions is not ideal…. Especially when grading a commercial or music video, different grading choices do need to be made to accommodate for viewing environments.” Mangini advocates for a set of agreed-upon standards for sound mixes in cinemas akin to those applied to television shows: a “good housekeeping seal of approval,” perhaps also based on LKFS levels. Blasingame paraphrases DP Steve Yedlin: “We shouldn’t be fighting for more pixels. We should be fighting for better pixels.” A return to lower spatial resolutions would allow for a significant increase in the efficiency and quality of bandwidth-limited images: “A good 2K image is so much better than a compromised 4K image.”
Industry attempts to educate customers as to best practices for audio and visual equipment such as “Filmmaker Mode” can only go so far. Home viewers need to be set up for success, which means a refocusing of marketing and production away from the latest branding and toward intuitive systems that deliver reliable images and sounds to nonspecialists. A 4K (or 8K) HDR signal yields, in practical terms, only marginal benefits over a 1080p SDR signal, yet consumers are being sold on those marginal benefits, often to their detriment. Though the clock cannot be wound back, consolidation of the current state of the art should take priority over what seem like inevitable further complications when these standards are inevitably superseded, as dictated by market forces. Systems that optimize fidelity and intuitiveness should be prioritized over the pursuit of larger numbers.
When audiences widely reject popular trends in contemporary cinema, they often blame filmmakers and their tools: Filmmakers, for their part, are apt to see this rejection as a conservative aesthetic impulse. Both parties, however, share a vital interest in promoting better, more intuitive delivery and exhibition systems as well as mutual understanding and education as to the mechanics of these systems and their best practices. It is the first step to restoring openness on the part of audiences to new and challenging aesthetic norms and a healthier cinematic landscape.
1 This I can personally attest to: The only time I’ve received the dreaded “knock” from the downstairs neighbor was during a home screening of Tenet.↩
2 A useful experiment: Sit in your living room (or wherever you watch content) for a whole minute in total silence. Identify the sounds you hear. There are probably many more than one would first think to identify!↩
3 Commonly, and largely incorrectly, referred to as “CGI,” or computer generated images, in this context.↩
4 This depends on the codec being used, but 20mbps appears to be the ceiling as of this writing.↩
5 Luckily, this situation is, to a degree, gradually improving with the introduction of tools like the Dolby Vision Cinema Targets & Trims, though they are still not a set-it-and-forget-it solution.↩