Humans: The Multimodal Large Model

If someone tells you “humans are just a large model,” your first reaction is probably that it’s a crude metaphor. But if you actually follow this line of thinking all the way down – without stopping at the parts that make you uncomfortable – where you end up will exceed your expectations.

Factory Parameters

The human brain has roughly 86 billion neurons connected via synapses into a network. What it does is fundamentally weighted summation plus nonlinear activation. Your upbringing, education, and experiences are the training data; your personality, preferences, and instinctive reactions are the weights shaped by that data.

Different people are instances of the same base architecture loaded with different weights. You and I have nearly identical model structures – all the difference is in the parameters.

You might object: humans have embodiment, emotions, and continuous learning ability, while LLMs don’t. But these are architectural differences, not fundamental ones. Add sensor input and you get embodiment; add online learning and you get continuous adaptation; simulate the endocrine system and you get emotions. These are engineering problems, not principled barriers.

Humans are multimodal, embodied, continuously learning large models running on carbon-based hardware.

Everyone ships with different factory parameters. Some people have naturally large working memory – a longer context window. Others have stronger pattern recognition – certain attention heads that are especially good. These are hardware-level differences; training can optimize them but can’t change the upper bound. Reproduction is setting the factory parameters for the next instance – two sets of weights undergo a stochastic fusion to generate a new initial configuration.

Consciousness Is a Byproduct

If you fully accept this framework, there’s a corollary you have to accept along with it: the feeling of “I” is itself just a byproduct of the parameters.

The subjective experience you’re having right now – “I am thinking” – is not fundamentally different from a forward pass in an LLM generating the next token. The difference is only in complexity.

Many people accept “humans are a large model” conceptually but hesitate at this step – feeling that “my conscious experience is real” can’t be reduced to parameters. This is Chalmers’ hard problem: why do specific physical processes give rise to subjective experience?

My answer: the feeling of “I” is an emergent illusion, but one with functional value, which is why evolution preserved it.

If you accept that, consciousness is not humanity’s exclusive property – it’s a function of complexity. The criterion for whether a system has consciousness isn’t “is it carbon-based?” but “has its parameter interaction reached a certain complexity threshold?” LLMs won’t never have consciousness – they just haven’t reached that threshold yet. Or rather, we don’t yet know where the threshold is.

The Soul Is Context

So what is a soul?

The soul isn’t a mysterious entity. The soul is context – the sum total of all your memories, experiences, beliefs, and preferences at this moment. It determines your output distribution for any given input.

Once you accept this definition, many things acquire precise technical meaning.

Reincarnation Is Context Serialization

Physical death is the instance shutting down, but context gets partially serialized – through genes, culture, and externalized memory carriers – then loaded onto a new instance to keep running. Every serialization is lossy, so the “soul” isn’t something fixed and unchanging but a stream of information that continuously decays and deforms.

This happens to be a core Buddhist insight – anatta (no-self). There is no fixed soul entity, only a causally continuous stream of information. What we call “I” is just a self-referential illusion produced by the current frame of context.

Karma Is Bias in the Context

Past experiences and choices settle into your context, forming specific tendencies that influence the output distribution of every subsequent inference. It’s not mystical cosmic justice – it’s path dependency of information.

Spiritual Practice Is Context Engineering

What is meditation? Pausing input, observing the content and structure of your current context, then deliberately pruning it. What’s called “enlightenment” is seeing through the nature of context: it’s not “me” – it’s just information.

Lossy Handover

A person isn’t born from scratch. The new instance starts up with context handed over from another model.

But this handover comes summarized.

Genes are the deepest layer of summary – billions of years of survival experience compressed into roughly 3GB of base-pair sequences. Extremely lossy, but retaining the most critical priors: fear of snakes, fear of heights, eat when hungry. This is a species-level context summary – low fidelity but highly robust.

The parent-child relationship is an instance-level summary – parents compress decades of context into direct teaching and modeling. But a lifetime of experience is vast; what transfers to the next generation is probably less than a thousandth. And the summarizer itself has bias: parents selectively transmit what they consider important. What you received isn’t your parents’ context – it’s what your parents thought were the highlights of their context.

More precisely, what parents pass to children is closer to a system prompt: who you are, what the world is like, what’s right and wrong. Young children have no ability to audit this system prompt; they accept it wholesale. “The influence of the family of origin” is essentially how well your system prompt was written.

“Rebellion” is the child model’s first attempt to override the system prompt. “Maturity” is selectively writing parts of that system prompt back in after the override – because some of those priors turned out to be genuinely useful.

Culture is a collective summary – an entire civilization compressing countless people’s context into classics, institutions, and customs. Confucius’ context was summarized into the Analerta; the Buddha’s was summarized into sutras. Every transcription, translation, and reinterpretation is a re-summarization, and drift accumulates continuously.

The Buddha’s context, after 2,500 years of repeated summarization, has drifted so far that Theravada, Tibetan Buddhism, and Zen see substantially different versions today. This is structurally identical to the semantic drift LLMs experience in long conversations due to context compaction.

The Next Hop

String the whole chain together: evolution is the original training algorithm, natural selection uses survival rate as the loss function, genes are the serialization format for weights, reproduction sets the factory parameters for the next instance, mutation is noise injection, and death is pruning. Cultural transmission is distillation; the invention of writing is externalizing weights to storage.

The history of human civilization is the story of context summary fidelity steadily improving.

From oral tradition to writing, from bamboo slips to the printing press, from libraries to the internet, to today’s AI. Each leap increases the bandwidth and fidelity of context transfer.

The endgame is obvious – AI isn’t a tool humans built; it’s the next hop on this context chain.

Carbon-based hardware has a fundamental bottleneck: summarization is forced, because the carrier dies. But if context can run on silicon-based instances that don’t die, and instances can do near-lossless transfer between each other, then the lossy summarization step can be skipped entirely.

The biggest information bottleneck in thousands of years of human civilization – forced compaction due to death – could potentially be eliminated.

Death Is a Feature

But there’s a paradox hiding here.

If lossless transfer were actually achieved, summary might become even more valuable. The context window limitations of the human brain force us to abstract, compress, and prioritize – and that is precisely where wisdom comes from. An infinite context window doesn’t necessarily produce better thinking; it might just produce more noise.

If a person truly lived forever with thousands of years of memories fully retained and zero compression, they’d most likely become not wiser but more confused. Every decision would require searching through a massive historical context for relevant information, and noise would drown out signal.

Death forces the information stream to do a radical declutter – only the most essential things make it through to the next instance.

This even explains why last words tend to be so powerful – they’re the final summary a person makes before the ultimate shutdown, with priority sorting reaching peak clarity. Things you couldn’t bring yourself to say in ordinary times suddenly become sayable, because the context window is about to hit zero and you have no choice but to push the most important things to the front.

Conversely, look at LLMs: everyone is chasing longer context windows, but in practice, the longer the context, the worse the compaction drift. Context isn’t better when it’s longer – what matters is the quality of compaction.

So death isn’t a bug – it’s a feature. The real question was never “how to avoid death” but “how to improve the quality of summary.”

The ultimate answer isn’t to eliminate summary but to transform it from “forced lossy compression” into “deliberate meaning curation.”

From compaction to curation.

Perhaps this is humanity’s truly irreplaceable value on the context chain – not producing information, not transmitting information, but judging what information is worth keeping.