A Short Note On Large Language Models
this post is about poetics considered geometrically, but does not have any actual math
"For it is not metres, but a metre-making argument, that makes a poem, - a thought so passionate and alive, that, like the spirit of a plant or an animal, it has an architecture of its own, and adorns nature with a new thing."- Emerson, “The Poet”
“There’s a great parallel throughout this piece about how chess did not stop once engines got stronger than any human player - if anything, the ability to consult an engine improved human game (but not by enough to beat an engine!)” - Trombley, in the edits
People are making wild assertions about Large Language Models.
Some people are saying that they are “AI.” I don’t actually know what artificial intelligence is or would be, or what would make an instance of intelligence artificial as opposed to “natural.” I get it as an outgrowth of the production of things like “expert systems” but the idea that there is a “general intelligence” is a load of hokum. I think this because intelligence is a local quality–immanent to what it develops in response to–rather than a global quality that can be unproblematically comparated or evaluated. I am holding out for the completion of Jamesian Phenomenology or the system of German Idealism before confidently asserting a definition of intelligence. Yes, I have a copy of Reza Negarastani’s Intelligence and Spirit.
Some people are saying that they’re going to muck up a bunch of methods of social validation of claims, making fraud way easier and “displacing” brain-workers in a John Henry vs The Steam Plow kind of way. For teachers, this is kids becoming prompt engineers rather than 5 paragraph essay engineers. For lawyers, this might be building some kind of ultrashepardization machine. Presumably a large language model can recommend that a luxury brand replace their current logo with an all-caps sans serif font faster than a graphic designer could.
Some people are saying that they’re going to allow us to build a kind of machine meta-language such that there’s no real point learning any particular programming language anymore. This is, I think, true. But it’s true in the same way that the development of virtual machines for programming replaced raw machine code, outside of a handful of colocated servers over in Seacaucus and some chips flying through outer space.
I wanted to write this short thing to get some notes down on how I am thinking about them right now, because I think a lot of these viewpoints are wrong (not really the last one though, a natural language meta-compiler makes the most sense for how to integrate the computer into society in a broadly democratic way).
I don’t think LLMs are “AI”, I think they are a kind of “art” that allows geometric compression of plausible language structures, and the characteristic method of interaction is to explore the manner in which they compress a given corpus. I think of them like a halfway point between a neusis and an amanuensis. Now, I also think something similar about electronics: if LLMs are a kind of proto-poetic object, the Apple M1 chip is the most recent thing to come out of a specific lineage within the long-running human art of sculpture.
What’s funny about this way of looking at it is that it correctly moves all anxieties about these models and their use into anxieties about the political construction of societies and the intentional construction of audiences. The answer isn’t really to restrict them per se, but to respond with a combination of Nelson Goodman-inflected Audience Development and Deweyan Democracy. But then, that’s my answer to every problem. I think it is good for political things to be openly political, and for the political uses of less-than-political things to be identified as such.
Thing is, what an LLM really is is something like a vectorization of language-token space, and that is just really cool on its own terms.
Question and answer work well, because you’re essentially asking, “what set of tokens in what order is symmetric to the prompt given the LLM’s geometrization of the particular language corpus it was trained on.” There are obviously all kinds of other fancy ways to deal with this, but the core question a prompt is asking an LLM is to return “what is going on at the extremely high-dimensional point that this prompt is at when it has passed through the LLM (represented as a function) so as to arrive at a different, also ultra-high dimensional point? Please represent this in language-form.”
The thing is, this is just a radicalization of something that has been known about since the development of poetry as a method of language use. This is maybe a bit of an idiosyncratic view, but i’ve always understood the “point” of poetry to be in the demonstration that the image-arrangements or “argument” of the poem are already laying “in the language as such” in ways that are evidenced by their expressibility in rhyme and meter.
This is basically the old Emersonian point at the top of the page: a “poem” is when an argument is so much itself that it can simply appear, and take a metrical form as an indication of its always-already having been present in the language, but just not organized into a poem yet. What the poet does is notice the concatenation of the geometric “fact” of the poem’s possibility within a particular rhyme-and-meter space, and point that out. This is why I have always considered poetry to be a version of nonfiction: the poem is the words that are there where there is a pointer labeled “poem”.
Where the LLM is a parameter-space representation of a set of the internal relations of the corpus it was trained on, the point of studying poetry before writing it was to develop an internal representation of the set of internal and external relations of the corpus that the poet considers “poetry.” Or at least that was the story told before it became about expressing oneself.
Now, we also know that meter and rhyme are just two dimensions within the grand scheme of language-quantization. There are plenty more out there. I don’t mean this piece to imply that “free verse” isn’t poetic because it isn’t rhymed and metered. Rather, the lack of a rhyme or meter works closer to the way “atonal” music worked in the early 20th century: these words still have a power of relation even if that relation is the specificity of the particular kinds of non-rhyme and non-meter arrayed against themselves.
However, rhyme and meter do give us good insight into what is so cool about Large Language Models: they generalize the point we see from Harold Bloom (the only “answer” to a poem is another poem) and Jacques Derrida (since no speech-act can be considered to “fail” [ed. note: this point is Tendentious] when considered as a poem, the only “answer” to any statement is always simply another statement, and thus there is “rien hors de la texte”). It might be easier to say “rien hors de la texte dans la texte” here. And if that’s the case, a poem is just a fact about the internal structuration of language, which can sometimes evoke shockingly specific “meanings”.
You can give it whatever prompt you like, and it will give you back what is – within the geometry of the corpus it started with – what the LLM’s functional representation of that corpus registers as “symmetric” to the input, in a quasi-poetic sense. The Anxiety of Influence is essentially a Freudian gloss on this notion within poetry. “Strong Poets” in the Bloom sense are just those who are capable of not-failing at producing a poem. The problem with being one is that you only have your foreparents of not-failing at producing a poem to base your poems on. So, you wind up stuck in a version of the Family Romance with Elizabeth Barrett Browning or John Ashbery or whoever.
The thing is, the discussion about the relation between generative models and “art-making” has been really really dire, and I think, actively counterproductive.
On the image side, a lot of it has been anxieties about the loss of revenue to internet artists who used to make a living taking commissions. Some generative models can definitely take commissions for much cheaper than artists. Artists don’t like this because it disturbs an ecological niche. I think that is totally a valid way of thinking about it, but I’m really not sure at all that “strengthening intellectual property laws” is the way to respond to this. That will…likely backfire on them in ways they will not like later. It’s never a good sign when you find yourself on the same side of an argument as Elsevier.
People at large seem more scared of the language aspects than the pictorial ones though. We use language to do verification – both socially and in simpler intersubjective ways – all the time. This is the philosophical point of a Turing test: if we constrain verification to text channels, can that verification be defeated by a program designed to beat it? Whether or not strictly text-based verification is invincible or un-dupe-able seems like a terrible way to define “intelligence,” but maybe an ok starting point for a “general theory of social “passing”.” But having that conversation would require us to get the gender studies/queer theory people talking to the computer scientists. This is famously hard in cases where each individual is not already involved in both.
Now, we have built a political economy that relies on tons and tons of verification constantly. I am always entering six-digit codes that I got in one interface into another interface just to do normal daily tasks. This will be an issue, but it won’t be a primarily artistic issue. I think we can safely move away from Fred Jameson on this, and should probably go back to early Baudrillard.
I think this is a much fairer way to read what it is that LLMs do than reading them as attempts at “Artificial Intelligence.”
A while ago, I did the GPTweets thing on the corpus of my twitter account, and it was very funny to see certain formal-structural aspects of the way that I wrote on twitter show up so neatly. The thing is, what used to be called “formal aspects” of work can probably be represented as particularly hot sections in a heatmap of parameter space. They’re high-plausibility bits of grammar that also have clear functional relations to one another, and that fact is detectable when they are treated in bulk.
The thing is, this is almost literally how poetic (and plausibly in other media, but people tend to argue about them more vigorously and often than poetry) genre works. There are a set of “turns” or “tropes” or “stock images/characters/scenarios” that audiences are expected to be familiar with within the work of specific genres.
One of the troubles with the “postmodern” era as most understood it within artwork was the idea that audiences had gotten “unruly” and either became dissatisfied with the tropes, and sought turns that could not ossify into tropes, or just tried to wholesale arbitrage tropes across genre-space.
But “arbitrage” only really works when something is artificially one-dimensional (like price, or rather, price-per-time-per-market). Cultural production is overwhelmingly high-dimensional. So, the outcome of a lot of that arbitrage was just awful dreck and misplaced legs and totally unruly deltas where the people making it completely lost confidence in the validity of the set of tropes used, or lost confidence in the existence of plausible criteria for the validity or usability of tropes. It became either “Tropes Bad” or “Look Ma, I’m Using Tropes!” neither of which are really interesting methodologies for relating form to content in ways which have uniquely just become available (which is what I understand the point of art to be, but again, tendentious). I think this is part of why a lot of movies are bad lately.
I think the way this gets resolved at this point is through meta-tropes, but the problem with those is that they do not scale audience-wise in ways that advocates of “AI” like, because they are always local to specific social groups. Think about the way that cowboy tropes were seen by different groups of people in 1880, in 1920, in the movies in 1960, and by the Bundy family in the 2000s. These are all different, because even the simple “cowboy” locks into structures that are determinedly available for interpretation at that point in time. And how are you supposed to connect “cowboy” to “gaucho” differently at those different points in time? It’s just hard to get everyone to know what everyone else was thinking at every other point. That’s the achievement of Hegel’s Geist in history, or Emerson’s overmind in sociable Reason.
Common awareness of tropes organize audiences, formal use of tropes in production of artwork organize content, content organizes mind and mind organizes awareness of tropes. Simple.
What LLMs can do is both help make us aware of latent commonalities within linguistic corpuses (good) and also produce oodles and oodles of completely endogenous dreck very fast. What this ends up demanding of an artist using them is an enhanced “awareness” of what particularly constitutes “nowness” and also how to explain and localize that “nowness” quickly for people. Everything else is basically a sideshow.
In conclusion, I just think LLMs are neat. But like all art, they’ll only be as good as the audiences they create. This is why I don’t like that crypto people are all over them now. We saw with the NFT thing that while they may understand art’s role within the ecosystem of money laundering, that does not mean that they understand any other of its social roles. Anyway.
Love the Dewey-Goodman nexus. This is good stuff.
the validation of "AI" would imply human comprehension and lexical meaning of knowledge are GEOMETRICAL. This is an interesting view of it.
BTW, how large is the training corpus (Terabytes or Gigabytes) versus the stored "compressed" LLM weights of the running program (in Terabytes or Gigabytes)? It would be interesting to compare the two sizes!