ANVIKSIKI inquiry · examination · clarity

Anvitābhidhāna and the Attention Mechanism

On meaning, and why nothing carries it alone

There is a quarrel about meaning that is now thirteen centuries old, and it was settled — or rather, restated in a form we can no longer ignore — by a piece of engineering published in 2017. The quarrel was between two schools of Indian philosophy. The engineering was a paper called Attention Is All You Need. That these two should arrive at the same structural claim, separated by twelve hundred years and by every conceivable difference of language and intent, is the kind of coincidence that is not a coincidence. It tells us we have stumbled onto something about meaning itself.

Let me set out the quarrel first, because it is older and because the philosophers, working without the convenience of machines, had to be more careful.

The question the grammarians could not avoid

Consider an ordinary instruction: bring the cow. You hear it, you understand it, you act. Nothing seems to require explanation. But the philosopher's task is to be puzzled by exactly the things that seem to require no explanation, and here the puzzle is this: where, precisely, does the meaning of this sentence live?

The Mīmāṃsā philosophers — whose original concern was the correct interpretation of Vedic injunction, a matter on which a great deal was thought to depend — could not avoid the question. A sentence tells you to do something. To obey it you must understand it. To understand it you must know what its words mean and how they combine. So: do you first grasp each word's meaning in isolation, assemble these meanings like bricks, and only then arrive at the sentence? Or does the meaning of each word already arrive shaped by the others, never standing alone at all?

This is not a trivial distinction, and the two schools that formed around it did not regard it as one.

Abhihitānvaya: meaning assembled

The first answer belongs to the Bhāṭṭa school, following Kumārila Bhaṭṭa. Their position is called abhihitānvaya — from abhihita, "what is denoted," and anvaya, "connection." The order of those two words is the whole doctrine. First denotation, then connection. The position runs as follows.

Each word independently expresses its own meaning. Cow means cow; bring means the act of bringing; the case ending means the cow is the object of that act. Each of these meanings is delivered first, complete and separate. Only afterward does a second faculty — a synthetic understanding that the Bhāṭṭas had to posit specifically for this purpose — gather the assembled meanings and construct the sentence-meaning from them.

There is something reassuring about this. It respects the dictionary. A word means what it means; you could look it up; its meaning is its own property, carried with it wherever it goes. The sentence is a sum. This is, I will note now and return to later, almost exactly how a computer first learned to handle language: a word was a fixed vector, a settled quantity, looked up in a table. The meaning was in the word.

But notice the cost. The Bhāṭṭas must introduce a separate, secondary operation to explain how the assembled bricks become a building. Meaning in isolation comes first and easily; meaning in connection is the afterthought, the thing requiring extra machinery. And ordinary experience does not feel like this. We do not hear a string of self-contained meanings and then, in a second mental act, glue them. We hear the sentence.

Anvitābhidhāna: meaning in connection

The rival answer belongs to the Prābhākara school, following Prabhākara Miśra, who lived in roughly the seventh century. His doctrine is called anvitābhidhāna — and again the word order carries the argument. Anvita, "connected"; abhidhāna, "denotation." Connection first, then denotation. Or more exactly: the connection and the denotation are not two events. A word denotes its meaning only as already connected to the others.

Prabhākara's claim is that words never present their meanings in isolation in the first place. The word cow, as it functions in bring the cow, does not first mean "cow-in-the-abstract" and then get recruited into a sentence. It arrives already meaning "cow-as-the-thing-to-be-brought." Its denotation is shaped, from the outset, by the syntactic and semantic company it keeps. There is no inert, context-free meaning sitting in the word waiting to be assembled. The isolated word is, strictly speaking, not yet meaning anything in the relevant sense at all.

His evidence is the way a child learns language — and this is worth dwelling on, because it is one of the most acute observations in the whole tradition. A child, Prabhākara argues, never learns words from a dictionary. No one walks up to an infant and says "cow: a quadruped of such-and-such description." The child learns by watching. An elder says bring the cow, and another elder brings the cow. The child sees the whole event — the command, the response, the world rearranging itself in answer. From many such episodes, the child infers what each word contributes. But the child never once encounters a word doing its work alone. The word is learned in its connections, from its connections, as a connection. Meaning is gathered from use in context, never from isolation, because isolation is precisely the condition under which language is never found.

The doctrine has a name worth keeping: anvitasya abhidhānam — "the denotation of the connected." The connected thing is what is denoted. Not the isolated thing, later connected.

This is the position I have taken for the name of this practice, and not for ornament.

What the two schools were really disagreeing about

It would be easy to treat this as a technical squabble among scholastics, the kind of thing that gives philosophy its reputation for splitting hairs. It is the opposite. The two schools are disagreeing about where intelligibility resides — and that question does not stay confined to grammar.

The Bhāṭṭa picture is compositional and atomistic: the parts are primary, real, self-sufficient; the whole is derived. The Prābhākara picture is holistic and relational: the relation is primary, and the apparent self-sufficiency of the part is an illusion produced by abstracting it out of the only place it ever actually lives.

Hold that distinction precisely, because we are about to find it again, built in silicon, by people who had — I am confident — never heard of Prabhākara Miśra.

The machine that had to solve the same problem

For most of its history, the computational treatment of language was unrepentantly Bhāṭṭa. A word was given a fixed representation — in the modern era, a vector, a list of numbers, looked up in a table. Bank had one vector. It carried that vector into every sentence: the bank of a river, the bank that holds your money, the bank shot in billiards. The meaning was in the word, settled in advance, the same everywhere. Abhihita: denoted first, in isolation. Connection, if it came at all, came later and clumsily, as a second operation bolted on — exactly the Bhāṭṭa afterthought, exactly its difficulties.

And it did not work well, for exactly the reason Prabhākara would have predicted. A word does not have one meaning that it carries everywhere. Bank near river is not the bank near deposit. The fixed vector is a dictionary entry, and Prabhākara had already pointed out, twelve hundred years earlier, that language is not learned from dictionary entries and does not behave like them.

The breakthrough — the attention mechanism, and the transformer architecture built upon it — is, when you strip away the linear algebra, a conversion from the Bhāṭṭa picture to the Prābhākara one. I will describe it plainly.

In an attention mechanism, a word does not keep a fixed meaning. Instead, every word in a sentence looks at every other word and asks, in effect: how relevant are you to what I now mean? Each word issues a kind of query; each word offers a kind of key; the match between a query and the keys determines how much each word's contribution is weighted. The representation of a word is then recomputed as a blend of all the others, weighted by how much each one matters to it in this particular context. Bank sitting beside river attends strongly to river, and its representation shifts toward the geographical sense. The same bank beside deposit attends elsewhere and means something else.

The word's meaning, in other words, is not intrinsic. It is computed from its relations to everything around it. The isolated token, before attention runs, is as inert as Prabhākara said the isolated word always was. Meaning emerges only in the act of attending — in the mutual regard of the parts.

That is anvitābhidhāna, rendered in arithmetic. Anvita: connected. Abhidhāna: denotation. The denotation of the connected. The machine does not assemble fixed meanings into a sentence; it lets the meanings become themselves through their connections, which is to say it never lets them be fixed at all.

I want to be precise about the strength of the claim, because precision is the whole point of this essay. I am not saying the engineers were inspired by Prabhākara — they were not. I am not saying the philosophy and the architecture are the same thing — they are not; one is a theory of how human understanding works, the other a method for predicting the next word. The claim is structural and it is exact: both concluded that a unit's meaning is not a property the unit possesses but a value computed from its relations to its context. Both rejected the atomism of fixed, portable, isolated meaning. They were answering the same question, and they gave the same answer.

Why this should be taken seriously, and not merely as a charming parallel

Here is the point I most want to make, and it is a point about philosophy itself.

It is fashionable to suppose that the hard, serious thinking happens in mathematics and engineering, and that philosophy offers, at best, decoration after the fact — a humanities gloss on results the technical disciplines achieved on their own. This essay is, among other things, a counterexample, and I would ask you to weigh it as one.

The grammarians had no machines. They could not run an experiment, could not measure a result, could not bury an unclear idea under successful predictions. They had only the demand for conceptual clarity under pressure, and that demand is not weaker than the mathematician's; it is in some ways more exposed, because there is nowhere to hide. Prabhākara could not say "it works, therefore it is right." He had to say why meaning must be relational, had to argue it from the way a child learns, had to defend it against the Bhāṭṭa objection that surely a word means something on its own — and to take the cost of his own position openly. The Mīmāṃsā debates ran for centuries precisely because each side was forced to make its concepts sharp enough to withstand the other. That is not a softer activity than proving a theorem. It is the same activity, conducted on the concepts that are too fundamental to be handed off to a formalism.

And the proof of its seriousness is that the engineers, when they finally got far enough into the problem, were forced back onto the very distinction the philosophers had already drawn. They did not need to read Prabhākara. The structure of the problem of meaning is real enough that two traditions, with nothing in common, arrived at the same fork in the road and took the same turning. The philosophers got there first, with less, by thinking harder.

This is what I mean when I say these are not four subjects. Gödel found in mathematics a limit no mathematics could see past. Heisenberg found in physics a statement not about our instruments but about the world. Prabhākara found in grammar a fact about meaning that has now reappeared, unbidden, inside a neural network. The boundaries between fields are conveniences of administration. The questions do not respect them.

And why it bears on a life

I will close by returning the idea to the ground it grows from, because I do not raise these matters only for their elegance.

If Prabhākara is right — and the machine, in its blunt way, has now testified that he is — then nothing is understood in isolation. Not a word. And not, I would suggest, a career, a decision, a relationship, or a person. A subject choice considered alone, abstracted from the temperament and circumstances and people it is connected to, is exactly the isolated word: inert, and not yet meaning what it will come to mean. It acquires its meaning only in its connections — to what came before, to what is around it, to the one who must live it out.

Most of the difficulty in a hard decision is not that some fact is missing. It is that the connections have not yet become visible. The work, then, is attention in the oldest and most literal sense: the patient turning of each element toward the others until the whole, at last, declares what it means.

That is what anvitam names. It is why I chose the word. And it is why this is the first essay, and not a later one.

← Back to all essays