Some Thoughts On Spatial Audio
Spatial Audio Resolves An Important Ontological Problem, But Introduces An Artistic One, Much Like 4K -- With Specific Reference To Oneohtrix Point Never and David Lynch
stereo has been around for a while. it’s a very simple structure: two ears, so two speakers, whether those speakers are on your head or in front of you. it’s great, we love it when Two Of Them.
thing is, the tech (driven by demands from VR) wants to drive forward towards “spatial audio”, where what’s created is a sculptural audio “environment” where particular sound sources are given Locations. Video game players like this, for sensical reasons: the higher fidelity the aural location of an event of interest (enemy, animal, river, etc), the better a skilled player can rely on it as a signal, whether in terms of an increase in spatial fidelity in inner representation, or in terms of a lower error rate for signals on that channel.
in terms of personal sound, we are thinking of the spatial audio on the airpod maxes, and in terms of room sound, we are thinking of the kind of wavefield synthesis that recent developments in so-called AI have made more plausibly solvable (im thinking here of a wavefield synthesis rig that can take a LIDAR reading of the space it is in, and then calculate dynamically how the frequency response of the room sound will embed itself within the room, and introduce cancelling standing waves, doing a kind of “noise cancelling” for a closed room of arbitrary contents. would be just a fucking lot of convex optimization in the frequency domain but that’s what NVIDIA’s for now right??)
thing is, this creates an interesting challenge for people who make audio-based art. the stereo field is two channels that represent the hardware environment into which many (hundreds, sometimes) of microphone channels are fed down into. right now, state of the art is figuring out fun ways to do “trompe le œil” while projecting those n channels onto a two-channel space. when you do this, you’re summing a bunch of the channels, so the trick is to kindof convert the space of possible variation into one that encodes the most meaningful aspects of the original group of signals. The problem though, is that you’re playing a game against the tensions of constructive and destructive interference. so you have to have technical know-how of how signals combine, how they can be cut to fit one another like building a brick wall, BUT ALSO social know-how to execute on questions of genre appropriateness (what kinds of bands are there, and how do these different kinds of bands answer the characteristic questions of recording differently, and in what way do audiences local to them understand that difference) BUT ALSO art know-how to see how the song relates to itself as projected into stereo recording. and so you use frequency-domain tools to most artfully reduce the number of channels going to the folks at home. which requires technical and social knowledge to execute on well.
NOW. what spatial audio does is actually provide a way of consistently representing channel-size signals at psychoacoustically identifiable locations, due to the construction of specific speaker systems combined with tuned filters to move signals in three dimensions in the stereo field, rather than two.
so, at the level of the ontology of recording, you can now have the construction of a 3-d space taken care of for you through pre-made technological aspects that the composer can count on the standardization of, as well as a system that can be used to “place” signals which retain their uniformity as single channels by having identifiably unique locations within the stereo field.
this gets rid of the “trompe le œil” but it also makes things Way more about the quality of the initial signals, rather than the fitting-together of signals to achieve a unity of purpose. or rather, the unity it achieves now is one whose goal has to be its creation in the mind of the viewer, rather than on the opaque 2D of a stereo recording.
THE THING IS: this means that a lot of genre techniques that have to do with achieving specific and socially-identifiable genre-effects through the use of a trompe le œil in mixdown can’t plausibly take advantage of the new technology. what it would do, to retain sonic genre, would be to figure out what was meant to be achieved by the social recognizability of those aspects and then work out how to achieve those things within the new technical space.
I would like to see someone like Dan Lopatin actually take on the challenge of composing specifically within this medium, but in a way meant to communicate something external to existing questions of music “genre” that the prevention of widespread sampling thru IP laws prevents.
I think Magic OPN kindof abdicates on it, because lopatin just takes us on a genre walkthrough of things he’s anxious about or things he enjoys remembering as a way of showcasing what the tech can do. that’s helpful, but it’s more of a demonstration record than a real artwork. He mostly sounds lonely and confused and looking for some reminder of what makes him who he is. Bums me out.
HOWEVER, I also particularly believe in his ability to do that, given that Garden of Delete executed on the whole 90s khaki computer peripherals as techno-desert storm II death as suburban backyards where mp3 players were just invented. the genre work accomplished the identificatory thrust of the music’s meaning. i don’t think anyone else “got” what 2004 was like for a certain kind of boys in the way that Lady Bird “got” 2004 for a certain kind of girl.
like, nothing in culture is straining against its inability to become sound-sculpture, so the operation of genre won’t produce the form, which is why we would need someone like Dan Lopatin to take the anticipatory step of announcing and explaining the goal.
the opportunity now exists to take a sculptural approach to audio, but we don’t have anything to point to as examples of even attempted sound-sculpture, simply bc the tech has never existed, so there’s no sense of how to understand something produced as one unless it’s announced as one, because no genre will grow towards it: the sculptures they produce will be sculptures of paintings with the traditional kind of painting painted on it. this is something that is like the problem of the fourth wall, but weirder.
the core idea is that the details of specific channel-points need to be managed as the point where “meaning” is generated in order to maximize the compositional opportunities presented by the form. a rock band is about producing an effect as a whole that recordings then represent. the thing is, it’s a small number of pieces. at the bit depth we are able to do now, and with some decent ear training, you really can hear all the way into this stuff. the problem is, we just don’t really have anything obviously ready to be composed in such a way as to take advantage of new tech like this.
basically what i’m worried about is that musicians are not clever enough to know how to use this compositionality and video game people are too artless to make something worth experiencing this way, even if they know the tech better.
we see a similar problem in 4k so far. people mostly use it to make smoother, sharper versions of the thing they were already going to make. in places where nature fills in all the details —so any 4k nature documentary —the tech itself is able to stand out bc the source material in some sense is the same way as it’s depicted, and is as detailed on as minuscule a level.
the problem comes in when this stuff is used for narrative: what happens is the details all get sharper than any of the detail-makers care to pay attention to. the fidelity of the medium is higher than the imagination of the filmmaker, and you get the inclusion of endless meaningless details that undermine the work by not presenting detail for its own sake or detail for the sake of the film, but rather Accidental Detail. this is ugly and dispiriting—even when people get what the problem is, they end up resolving it by “having nature take care of it” as in the hobbesian theodicy of Triangle of Sadness. it’s a cheap cop out, but at least it’s the characteristic cop out taken by incipiently-monarchistic critics of liberalism’s failures to have already reached its goals, so it’s funny in context, if icky.
the way we know this is true is because we have an example of this fate being avoided by someone whose imagination tries actively to adjust itself to the fidelity of the medium he is working in: David Lynch. he says all over the place that with each film, he’s looking at the state of the art, and then he proceeds to engage the state of the art tech in a painterly manner, by asking what of its subject it can represent uniquely that past media couldn’t.
this was expected to be a huge problem with Twin Peaks s3, since the soft-focus VHS of the first two ceilings is not only beloved, distinctive, and well-understood, but was also in the heights of a supercycle of fashionability when season three came out.
thing is, season 3 makes completely clear from the outset that Lynch is aware of how much information goes into the amount of space made available by 4k. i mean just look at it, im sure i’ll have to give an actual argument for it, but every blade of grass gives the sense of “having been paid attention to” in terms of how it’s represented.
this is even better, because the time distance allows the distance in media of representation to express themselves in terms of the distance in time between seasons one and two and season three. in the first two seasons, one was working in analogue, filming fields of light of different color, with the intention of projecting those fields of colors to some different size in a home television. everything is floating and hazy, and made more really so by looking like what representations of that era looked like to people in them, no less than the decor in the houses. in the third season, by contrast, you can tell lynch knows that he is filming 8,294,400 individual pixels, specific determinate spots with specific determinate details. this is even truer with an OLED t.v., where each individual point of color also produces it’s own light—earlier LED tvs relied on a common light field refracted in different colors. and the things he films that way are the things folks are now filming poorly with them, but in a way that is composed, and also reflected in the sort of transition from analog to digital that is happening over the course of the series as a whole.
the apotheosis of failing to do this ought to have been some kind of Outer Space B Movie, if we were still keeping up with tradition, but instead it’s the marvel films. I can’t think of a firmer example of not paying attention to how the details and the whole integrate into a narratively satisfying unity as the Marvel Studios practice of filming actors and actresses delivering lines on green screen while a team unrelated to the team selecting lighting for the shoot meticulously draws out in whatever virtual AutoCAD they use all of the details of the explosions and distant sky and mountains.
the only rock and roll artists i can think to have plausibly addressed this issue would have been steely dan, who also made sure to operate at the forefront of how different ways of using the existing technology could “mean” differently. they say as much about Gaucho, and the icky 2000s brickwall L2 gloss of Two Against Nature really solves it. it’s basically the audio equivalent of Inland Empire.
At the heart of it, there is some cool new tech and as a critic, I’m worried our auteurs aren’t up to the task.