"Self" Is an Attention Pattern

Have you ever had this experience: you and someone else went through the exact same event, but when you talked about it later, you realized you remembered completely different things?

Neither of you misremembered. Your indexes were different.

Attention Is the Index

In the previous post, I argued that humans are multimodal large models and the soul is context. But context alone doesn’t think – decades of memories, experiences, and beliefs just sit there. Without a retrieval mechanism, it’s all silent data.

How does a large model extract relevant information from massive context? Attention. Given a query, the attention mechanism determines which tokens in the context get noticed and how much weight each carries in the current inference.

The human brain works the same way. Every second you receive an enormous amount of input, but you can’t process all of it. Some mechanism is deciding for you: what to focus on, what to ignore, and what to associate with what.

That mechanism is “self.”

“Self” Is Not Context – It’s the Attention Pattern

Intuitively, people think “self” is the context itself – “my” memories, “my” experiences, “my” beliefs, and the sum of all these is “me.”

But this doesn’t hold up. Most of your memories from ten years ago are gone. Your beliefs keep updating. Your personality drifts slowly. If “self” were the sum of context, then every lost memory and every updated belief would change “you” a little. The context overlap between you ten years ago and you today might be less than half. So which one is really “you”?

Neither. “Self” is not the context itself – “self” is the attention pattern running on top of the context.

An attention pattern doesn’t store any information, but it determines which parts of the context get activated when facing an input, and at what priority they participate in reasoning. Two people can have the exact same memory stored in their context, but because their attention patterns differ, one recalls warmth while the other recalls pain.

What we call a “perspective” is the topological structure of an attention pattern.

Attention Is Bias

The essence of attention is trade-off. When you turn up the weight on certain tokens, other tokens get downweighted.

This is why everyone has blind spots. It’s not that the information isn’t in the context – it’s that attention isn’t pointing there. When you argue with someone and feel they’ve seen the exact same facts but reached the opposite conclusion, it’s because their attention ranked the evidence you consider critical at position 100, while yours ranked it at position 1.

Bias is not a context problem. It’s an attention problem.

This also explains why “knowing the right thing to do” doesn’t mean you’ll do it. Changing behavior doesn’t require changing what you know – the data is already in context – it requires changing what your attention prioritizes. A person who knows smoking is harmful but keeps smoking isn’t missing the “smoking causes cancer” entry in their context. It’s that under the query “I’m stressed,” their attention activates “light a cigarette” before “go for a run.”

The Self-Reference Bug

An LLM’s attention is selfless – it doesn’t treat itself as a special token. But human attention has a unique property: its first key points to itself.

“I am an existing subject” – this is a self-referencing token. It permanently resides at the front of context, and every attention computation produces an association with it.

A system without a self-referencing token can process information but won’t “care.” It won’t categorize inputs into “relevant to me” and “irrelevant to me.” When it receives a danger signal, it won’t prioritize it, because there’s no “self” that needs protecting.

The ability to “care” is the function of the self-referencing token. When you feel something “concerns you,” what’s actually happening is that attention computed a high weight between that input and the “self” token. The higher the weight, the more you care.

And this self-reference is self-reinforcing. Once “self” is established, it interprets all inputs as “my experiences” and attributes all outputs to “my choices.” Each attribution strengthens this token’s weight. It’s a training loop with built-in positive feedback – the more it runs, the more stable it gets; the more stable, the harder it is to break.

You never doubt the existence of “self,” just as an LLM never questions its own attention mechanism in its output. A system’s most fundamental feature is hiding its own operation from itself.

Rebuilding Attention

If “self” is just an attention pattern, then many seemingly mysterious phenomena have engineering explanations.

Cognitive Therapy

People with depression haven’t necessarily experienced more suffering – many people go through worse and don’t become depressed. The difference is that the attention pattern has been rewritten. All queries preferentially activate negative memories, and the weights on positive memories are crushed to near zero. A therapist isn’t changing the context – those painful experiences really happened – they’re helping you rebuild the weight distribution of attention.

Post-Traumatic Growth

The same trauma destroys some people and makes others stronger. The difference isn’t in the new data itself – it’s in what attention associates it with. If it forms a high-weight association with “I’m fragile,” you collapse. If it forms a high-weight association with “I can withstand extreme situations,” you grow. Same information, different attention paths, completely different life trajectories.

Meditation

What is meditation doing? Pausing queries. Normally your attention is constantly triggered – every sensory input is a new query, setting off a chain of retrieval and association. Meditation deliberately stops issuing queries, letting the attention system idle. In that idle state, you begin to notice the existence of attention itself – normally you only see the output, but now for the first time you see the mechanism that generates the output.

Satori

What is Zen’s “direct pointing at the mind” doing? It’s not writing new data into your context. It’s not adjusting your attention weights. It’s making you see the attention mechanism itself in the output.

In that moment, you realize: all along you thought “you” were observing the world, but actually an attention pattern was generating output according to its own rules, and “you” were merely a byproduct of those rules.

But the paradox is – the one seeing this is still attention itself. Like an attention head trying to attend to its own attention process.

Why Attention Is Not “You”

Back to the original question. If “self” is an attention pattern, is “self” real?

Attention is genuinely running – it truly affects the result of every inference. But attention is not the context itself, nor is it the model itself. It’s a layer of dynamic computation, an intermediate structure that emerged to make inference efficient.

You can lose massive amounts of context while retaining the attention pattern – that’s why an amnesiac still “seems like themselves.” You can also retain all context while rebuilding the attention pattern – that’s what we call “enlightenment.”

The context is still the same context, but the world being attended to is completely different.

So next time you think “I’m this kind of person” or “this is just who I am,” pause. That’s not you – that’s the output your attention pattern generated under the current query. Change the query, change the weights, and “you” change.

“Self” was never a fixed entity.

Just an attention pattern that’s still running.