Tag: history

We Forgot to Teach AI Agents to Be Wrong on Purpose

What old logic papers know about the scientific process that challenges your AI scientist

If you are a scientist like I am, or a clinician, let me portray a likely scenario. You are excited about your AI Scientist / Research Agent. So it set it up for research task.

Your agent reads four hundred papers before breakfast, proposes a hypothesis, designs the experiment, writes the code, runs it, and hands you a result it is ostensibly pleased with.

You feel like a Tony Stark telling Jarvis, Thrill me! and Jarvis replying The render is complete. A little ostentatious, don’t you think?

Then, a new finding lands on the table that flatly contradicts the hypothesis. You feed it in. The agent thinks for a moment and gives you, well, a revised answer. Suave. Maybe its confidence ticks down, maybe an internal ranking reshuffles, maybe it just rewrites the paragraph in a slightly more hedged voice. Its mind, if we could call it a mind, has been changed.

Now ask it the only two questions that actually matter in science:

Which of your beliefs just took the hit?
What, exactly, would change your mind back?

It cannot tell you. Not because it’s stupid — it is, embarrassingly, smarter than you in several fronts — but because the reasoning that produced the first answer and the reasoning that produced the second both happened inside a forward pass, in the guts of a machine (LLM) that transiently activated a stupendously large nexus of artificial neurons in both cases, mostly indistinguishably. You can ask it to narrate its reasoning, and it will, fluently and at length, in the confident cadence of Jarvis. But that narration is generated after the fact and isn’t the mechanism of reasoning. What you want is an argument.

This is the gap I want to talk about. A small group of logicians and philosophers worked out exactly what’s missing here, in detail, with theorems, before some of the people building today’s AI scientists were born. The newest paper I’m going to lean on is from 2004. The oldest is from 1987. They are not famous. They are, I will argue, the missing manual.

Science is a game with three logic moves

Forget agents for a second and ask a more basic question: what is a mechanism of doing science?

Charles Sanders Peirce — American polymath, logician, the kind of person who invented three fields before breakfast and died broke — answered this in the 19th century. Claudio Delrieux dusted it off and formalized it beautifully in a 2004 paper with the unglamorous title Abductive inference in defeasible reasoning: a model for research programmes. Peirce’s answer is that scientific reasoning is a triad of three different inference types which run in a specific order:

Induction gathers the evidence. You collect observations. This is the move large language models (LLMs) are pretty good at, because the whole training objective is to compress a planet’s worth of regularities. Induction is probabilisitic.
Abduction finds the best explanation for the evidence. You see a surprising fact and reverse-engineer a story that would make it unsurprising. Peirce called this the source of every genuinely new idea — the only one of the three that adds content instead of rearranging it. For example, you lawn is wet this morning. You abduce that it may have rained last night.
Deduction then predicts what else must be true if your explanation is right, so you can go test it. If it rained, the car must be wet too. Let’s check the car. Deduction is a favorite of mathematicians. Axioms and rules lead to proofs.

And then you loop. The deduction sends you back out to gather more evidence, which throws up something your explanation didn’t predict, which forces a new explanation, which makes new predictions, and so on.

The surprising fact is: every step of this loop is defeasible. None of it is unassailable. Meaning, each conclusion is held provisionally, with the sane expectation that tomorrow’s evidence might revoke it. Delrieux’s whole project is a reasoner that can draw useful conclusions even when the knowledge it’s working from isn’t known to be true — even when it’s known to be false. Read that twice. That’s not a bug. That is precisely what doing science before you’ve figured it out feels like.

So defeasible is the important idea here. Before we go further, let’s get definitions right. The three Peircean moves are summarized in the table. Notice the last column. Each inference, however different, gets defeated, and it gets defeated in a way that is characteristic of its kind. I have framed the examples to appeal to the perspective of a clinician. You can reframe to suit your field of expertise.

Kind	What it does	The call at the bedside	How that very call gets defeated
Deduction	Necessary, monotonic consequence: if the premises hold, the conclusion must hold.	Potassium’s back at 7.2. Severe hyperkalemia stops hearts — we need to treat it now!	The patient feels fine and the ECG is very normal. Wait! The sample hemolyzed in the tube and the 7.2 is an artifact. The clinician does not doubt her logic, she doubts a premise* and retracts her former deduction.*
Induction	Ampliative. Generalize from many cases to the next one, assigning probabilities.	Every child I’ve seen with this exact runny-nose-and-fever has had a harmless virus. Mostly like this child does too.	The child comes back with a stiff neck and a purplish rash. Tests reveal meningitis. Generalization is gone. Black swan.
Abduction	Inference to the best explanation. Adopt the most explanatory story and then go test it.	Crushing chest pain, sweating, radiating to the left arm, age sixty. Best running hypothesis: heart attack.	The pain is tearing, it bores through to the back, and pulse between the two arms do not match. An aortic dissection and it reverses the treatment. The winning hypothesis lost to a rival that explained more.

Notice what every row had in common: a conclusion you would genuinely act on. Each conclusion was withdrawn when a different evidence manifested. These lead to a dubious premise, a black swan or a better explanation. That withdrawal is defeasibility, and it is the crux of good reasoning hygiene.

Note that defeasible is not a distinct category of inference. It is rather a mode where Deduction, Induction or Abduction can all be defeasible. The central thesis of this essay is that today’s AI scientists perform Deduction & Induction fairly well, Abduction moderately well, but is bad at this mode of defeasible.

How, exactly, does a conclusion get defeated? There are two ways. For that we go back to 1987.

Two ways to lose an argument

John Pollock’s 1987 paper Defeasible Reasoning opens with the most pedestrian example imaginable and it’s worth reproducing it.

Something looks red to you. That’s a perfectly good reason to believe it is red — Pollock calls it a prima facie reason, good to hold true until further notice. Now, two completely different things can go wrong with that belief.

Way one: someone produces irrefutable evidence that the ball is actually white — your eyes tricked you. You now have a reason to believe the opposite of your conclusion. Pollock calls this a rebutting defeater — it attacks the conclusion head-on. It’s the kind everybody already understands, and the kind almost all early AI modeled. Evidence for and against; we settle it on the merits of the evidence.

Way two is sneakier and far more important. Someone tells you the room is lit by a red light. They have not given you a single reason to think the thing isn’t red — it might well be. What they’ve poisoned is the link between “looks red” and “is red.” Under red lights, looking-red stops carrying information about being-red. Pollock calls this an undercutting defeater — it attacks the inference, not the conclusion.

Let’s ponder on with how different these are. A rebutting defeater changes your answer. An undercutting defeater changes whether you’re entitled to an answer at all — it cuts the wire between premise and conclusion while leaving both endpoints untouched.

Here’s the same distinction with the stakes turned up. A patient has low back pain; the prima facie read is mechanical, treat conservatively. Two very different things can arrive:

The rebutting defeater walks in as a red flag: unexplained weight loss, night pain that wakes them, a history of breast cancer. That’s a reason for the opposite conclusion — it argues the problem is not mechanical, and attacks the diagnosis directly.
The undercutting defeater is the one that gets people sued. You order a test to rule something out. It comes back negative. Normally that would lower your suspicion. But this particular patient was on steroids that masked the inflammation, so the negative result tells you nothing. It doesn’t argue the patient is fine. It severs the link between test negative and all clear. Nothing looks wrong. The number came back but the reasoning from that number is dead.

The undercutting case is the dangerous one precisely because nothing looks wrong, and a reasoner that only tracks rebutting defeaters — only ever asking “do I have evidence against my conclusion?”— sails straight past it.

Now go back to our agent. Its answer shifted: a confidence slid, a ranking reshuffled, a paragraph got more hedged. Which kind of defeat was that? Did the new evidence argue against the hypothesis (rebutting), or reveal that the experiment producing the original evidence was confounded (undercutting)? Those demand opposite next moves — revise the theory versus throw out the measurement and redo it.

When two good arguments fight it out

Fine — track defeaters explicitly. Now you’ve got the problem that gets interesting: what do you do when two perfectly respectable lines of reasoning point in opposite directions and neither defeats the other?

The canonical example is so good it has a name: the Nixon Diamond. Nixon was a Quaker, and Quakers are (defeasibly) pacifists. Nixon was also a Republican, and Republicans are (defeasibly) not. Both inferences are fine. Both fire. They collide. Was Nixon a pacifist?

Horty, Thomason, and Touretzky took this seriously in their 1990 paper A Skeptical Theory of Inheritance in Nonmonotonic Semantic Networks, and the idea is sharp: when defeasible chains conflict, you have a genuine fork, and you must pick a temperament.

The credulous reasoner picks a side. It’ll hand you an answer — Nixon was a pacifist! — and if you ask again it might cheerfully hand you the opposite.
The skeptical reasoner refuses to be conclude. Two arguments cancel? Then it suspends judgment and says so loudly. The paper argue that in many scenarios this is the grown-up move: the reasoner should never claim a conclusion it didn’t earn.

Here’s why it matters for machines. Every reasoning system has a temperament, whether or not its builders chose one. Your agent has one too — it’s just implicit, undocumented, and drifting with the stochasticity inherent in token generation.

Clinicians have a less flattering name for an invisible credulous temperament: anchoring, or premature closure. The history pulls toward two live explanations and the rushed clinician grabs the first that fits, commits, and stops looking. That’s a credulous reasoner. The skeptical move is to hold both as a conscious policy and order the thing that discriminates between them. The diagnostic-error literature is, more or less, a catalogue of what happens when a high-stakes reasoner is silently credulous. You want that temperament to be a setting and not an unknown.

A short interlude concerning the platypus

I want to slow down on one example, because it’s the one I keep coming back to, and it teaches the single most counterintuitive lesson in this whole essay.

Here is a chain a child could follow. Mammals bear live young — true, unequivocally true; dogs, whales, bats, you. The platypus is a mammal — also true; fur, milk, warm-blooded, the works. Therefore: the platypus bears live young. They are viviparous.

The platypus lays eggs. Oviparous.

Notice what didn’t happen. Neither premise was false. The platypus really is a mammal; mammals really do, as a rule, are viviparous; the inference was warranted — and yet it produced a flatly wrong conclusion. This is the canonical case from the inheritance-with-exceptions literature (Touretzky built whole formal systems around exactly this shape), and it’s the cleanest possible demonstration that a defeasible generalization does not lead to a broken universal law — it’s a different kind of object entirely. Mammals are viviparous was never a theorem. It’s a default: true unless this case says otherwise, and the platypus is exactly the case that says otherwise.

A black-box reasoner does something quietly tragic when it meets the platypus. It experiences the contradiction as noise — two signals disagree, so it averages them into a noncommittal hedge. Probably viviparous, most likely. That’s the worst possible response, because it treats the single most informative event in the whole exchange as an error.

The contradiction is not the problem. The contradiction is the discovery. The platypus case— it is your knowledge telling you where its own categories are too coarse. The right response isn’t to soften the rule. It’s to refine the structure: realize that reproductive mode is an axis, that viviparous and oviparous are on it, and that mammals bifurcate. You come out the far side of the contradiction with a better map than you went in with, monotremes filed exactly where they belong. The anomaly upgraded our knowledge.

That completely inverts the black-box instinct. Probabilistic systems treat a surprising contradiction as something to be explained away, averaged. A reasoner that keeps its priors and defeaters as explicit, inspectable objects treats the same contradiction as something to be resolved and learned from. One of them ignores the platypus case. The other says oh, interesting, and goes to reorganize the zoo.

Guess which one is doing science.

The same platypus case, running backwards

The story so far ran top-down, from a category to a property. It’s a mammal; mammals are viviparous by default; therefore, defeasibly, so does this one. That’s a default inference. It hands you a belief to act on, and it’s defeated when a more specific fact — the eggs — overrides the inherited default. Direction of travel: rule → case.

But hiding inside the same animal is a different reasoning act, running the other way, and it’s abduction. Rewind to 1799, when the first specimen reached the British Museum and the zoologist George Shaw picked it up. He had a bundle of bewildering observations — a duck’s bill, a beaver’s tail, mole-like fur, webbed feet — and reasoned backwards to the hypothesis that would best account for the bundle. His first, entirely sensible abduction: this is a hoax. Some taxidermist in the colonies must have sewed a duck’s bill onto a mole. (The era was thick with stitched-together fakes, so this was a good inference). Shaw took a pair of scissors to the pelt and went hunting for the stitches. The scientist had generated a hypothesis and went about testing it.

That difference — belief-to-act-on versus hypothesis-to-go-test — is the heart of it. Peirce, who named abduction, insisted it doesn’t deliver belief at all. What an abduction sanctions is, in his phrase, a reason to suspect — a ticket to investigate and not a conclusion to stow away. The Stanford Encyclopedia files abduction squarely under defeasible reasoning. Abduction is defeasible, but it is not the same thing as defeasible default reasoning. Defeasibility is the property — conclusions can be retracted. Abduction is a particular kind of inference — observations → best explanation — that happens to have that property. Lay the two side by side on the one animal:

The default act: treat platypus as viviparous until told otherwise. Direction: rule → case. Gives you a belief to act on. Defeated by a more specific fact.
The abductive act: what’s the best story for this bizarre animal— and let me go check. Direction: observations → explanation. Gives you a hypothesis to investigate. Defeated not by one fact but by a better-explaining rival, weighed across all the evidence at once.

Now watch them meld, because they’re the gears of Peirce’s loop. Abduction proposes the category (“it’s a real animal, and a mammal”). The default inheritance predicts a property (“so: viviparous”). An observation defeats the default (“oviparous”). And that defeat is precisely the anomaly that kicks abduction back into gear: if it’s a mammal and it lays eggs, what kind of mammal explains that? — and the answer, a monotreme, an early-branching lineage that simply never gave up oviparity, is a redrawn category. Notice who did the repair. The default inheritance couldn’t fix itself; it just generated a contradiction. It took abduction — the holistic, weigh-everything-at-once move — to resolve the collision by redrawing the map.

Then there’s the part that should keep anyone building a reasoning machine up at night. The mammals are not viviparous default wasn’t merely strong, it was strong enough to be wielded as a weapon against correct evidence. For the better part of a century, Aboriginal Australians and settlers reported that the platypus laid eggs, and European naturalists dismissed them, because the default outranked the testimony. The prior was quietly running an undercutting defeater, in Pollock’s exact sense, against every report that disagreed with it: the witnesses must be mistaken — mammals don’t lay eggs. It took until 1884, and eggs physically in hand, for the naturalist William Caldwell to break the spell with a famously terse telegram from Queensland: Monotremes oviparous, ovum meroblastic. That’s eighty-five years later! Do no underestimate the perils of strong priors.

That is the precise failure mode to dread in a machine with superhuman priors. An AI scientist’s defaults will be stronger than Shaw’s, mined from a planet of text, and a system that cannot distinguish “I’m applying a default this case might override” from “I’m dismissing a result because it disagrees with my prior” will do to your anomalous-but-correct data exactly what Europe did to the platypus. And a black box genuinely cannot tell those apart. The default getting overridden, the abductive hunch you ought to test, and the over-zealous prior destroying the inconvenient observation—the correct next action differs completely in these scenarios:

Default overridden? Refine the taxonomy
Have an abductive suspicion? Go run Caldwell’s experiment — go find the eggs.

When your priors are busy undercutting the testimony that disagrees, that’s exactly where good science is peeking.

The plot twist: deduction is the fragile one

Now I get to ruin your intuitions.

Everybody knows the hierarchy of inference. Deduction sits on top, gleaming, certain — true premises, guaranteed conclusion, no take-backs. Mathematical.

Stephen Biggs and Jessica Wilson, in a 2004 chapter with the gleefully provocative title The Indefeasibility of Abduction, argue the picture is upside down.

The trap is a near-universal assumption: that reasoning is defeasible if and only if it rests on a logically invalid argument. Under that assumption deduction (always valid) is automatically safe, and abduction (always invalid in the strict sense) is automatically risky. Clean. Tidy. Wrong.

Watch what happens when you take Pollock’s rebutting defeaters seriously and apply them to deduction itself. You run a valid deductive argument. The conclusion is absurd — it contradicts something you’re far more sure of than the premises.

This happens constantly in real science: you derive a result so ugly you know in your bones something upstream is broken. What do you do? You don’t accept it. You run the argument backwards as a rebutting defeater against one of your own premises: if the chain is valid and the conclusion is false, a premise must go. That’s reductio ad absurdum, the oldest move in the book — and it means deduction can be defeasible. Its conclusions can be defeated while the inference stays perfectly valid.

Clinicians run this backward move daily. The assay says a normal (D-dimer) blood test rules out a clot. The result is normal. The valid deduction says: no clot. But the patient in front of you looks exactly like a pulmonary embolism, and your confidence in that gestalt outranks your confidence that the assay’s premises hold for this patient — so you run the chain backwards, doubt a premise rather than your clinical intuition and observation. You defeated it anyway because something more holistic outranked it.

Abduction, by contrast, Biggs and Wilson argue, is holistic — when you infer the best explanation, you’re already weighing the whole field of rival explanations against all the evidence at once. There’s no lone exposed conclusion for a rebutting defeater to snipe, because the inference already absorbed everything a rebuttal could raise. New evidence doesn’t rebut an abduction so much as re-run the competition of logical explanations. That makes abduction, in their phrase, the ultimate arbiter of any domain it operates in: when deduction and abduction disagree, abduction wins, because abduction is what decides which premise is throws out. (This is exactly what happened to the platypus: the deductive default couldn’t repair itself but the holistic re-explanation did. Note: Not every philosopher buys this — the holism premise is precisely where critics push.)

An AI scientist that treats its formal derivations as bulletproof and its hypotheses as disposable guesses has its epistemics exactly backwards. A system that trusts the prover and discounts the abducer will defend a broken premise to the death because the deductive logic checked out.

So is any of this buildable?

Enter the paper that belongs on more whiteboards: Bondarenko, Dung, Kowalski, and Toni, An abstract, argumentation-theoretic approach to default reasoning (1997). This is the one where a whole zoo of scary formalisms but here is the crux of it.

The machine has shockingly few parts:

A boring, monotonic base logic — the settled facts you’re not negotiating.
A set of assumptions — the defeasible leaps, in the absence of evidence otherwise, believe this.
An assumption can be attacked if its contrary can be derived, possibly with help from other assumptions.
A set of assumptions is admissible if it’s conflict-free (doesn’t attack itself) and defends itself — it can counter-attack whatever attacks it.

That’s the whole substrate. Beliefs are admissible sets of assumptions, defended. Everything above lives here naturally: Pollock’s defeaters are attacks; an undercutting defeater attacks the assumption that licenses an inference rather than the conclusion. The Nixon Diamond is two assumption-sets attacking each other symmetrically, and credulous-versus-skeptical is just which extension you compute — a maximal defended set, or only what survives in every defended set.

The bookkeeping for explicit, auditable, defeater-aware reasoning was specified in this work, with semantics and existence theorems. We don’t lack the theory. We ignored it because next-token prediction was easier and, for a while, more impressive.

How do we do science in practice?

Now assemble the whole thing into the picture Delrieux was building toward, where AI scientist becomes something you could actually design.

Delrieux models a theory the way Imre Lakatos did: as a research programme with a defended structure.

At the center is the hard core — Newton’s laws. The genetic code.
Around it, a protective belt of auxiliary hypotheses — the negotiable assumptions that take the hits so the core doesn’t.
A negative heuristic: when bad evidence arrives, don’t aim it at the core — absorb it into the belt.
A positive heuristic: proactively grow and systematize the belt, ideally turning yesterday’s ad hoc patch into tomorrow’s principled consequence of the core.

So what happens if an observation arrives that the theory doesn’t merely fail to predict but actively forbids? A surprising observation is one the theory is silent on. An anomalous one is something it positively rules out.

When the anomaly hits, the programme faces a choice, and the entire health of the science rides on how it’s made:

Absorb it into the belt. This is what Ptolemy did for centuries — every time the planets misbehaved, another circle upon circle.
Let it reach the core. Admit the central commitment is in trouble. Copernicus.

The difference between a progressive programme and a degenerating one — between science and increasingly desperate bookkeeping — is how it routes incoming anomalies through its own defended structure.

In my taste, that routing decision is the science. It is the single most important reasoning act a scientist performs.

It runs on a doctor’s exam table every day too. A working diagnosis is a little research programme. The diagnosis is the hard core. The protective belt is every auxiliary move that explains away what doesn’t fit: the pain isn’t improving because the patient is deconditioned; the numbness is just referred etc. Each is a legitimate auxiliary hypothesis — and each is exactly the epicycle a degenerating diagnosis hides behind. The anomaly that should be allowed to reach the core is the progressive weakness, the night sweats, the patient who fails to respond the way the diagnosis insists they must.

When your agent meets an anomalous result, something in there decides whether to quietly patch the hypothesis (grow the belt) or question the framing (touch the core). It makes the Ptolemy-or-Copernicus call on every observation. You can’t audit the routing, because there’s no structure amenable to such introspection.

The glass box is what’s missing

Before I poke at things and be the contrarian: the current crop of AI scientists are pretty impressive. These tools have made real, validated findings. However, most of them are a tree search is an explore-versus-exploit machine. It is structurally incapable of representing what logical branch is in conflict with which one, and what is the defeater that decides between them.When a result contradicts an earlier assumption, the tree doesn’t route the anomaly through a defended structure.

A 2026 case study of autonomous-research frameworks found that every system produced what the authors politely called sophisticated hallucinations, and — the killer — that inside multi-agent pipelines those hallucinations get structurally integrated into plans and write-ups, so you can no longer separate genuine computation from confident fabrication. Their recommendation? Explicit separation between speculative and computed statements.

A critical analysis of Sakana.ai AI scientist noted, “It heavily depends on user input, struggles with methodological soundness, and lacks the ability to critically assess its own results. […] AI models inherit biases from historical data and cannot independently distinguish between scientific quality and consensus.“

And that is precisely my worry. The frontier systems reinvented the choreography of defeasible argumentation — propose, critique, debate, rank, evolve — then discarded the bookkeeping that would make it auditable, keeping a score where a defended structure should be.

Couldn’t we do better?

What I have been building

I have been playing with these ideas for a while now in collaboration with a small team. I decided to focus first on an AI medical assistant in Physical Therapy (PT) field. We narrowed it further to a school of PT called Mechanical Diagnostics and Therapy (MDT, McKenzie Method).

Let me walk through some of the key ideas. My criteria for a true glass-box AI agent is that:

it should be able to organize knowledge and hypotheses in a hierarchy (of general to specific) as we do in our brains
it should be able to track evidence states defeasibly across those hierarchies while recording their contradictions (Nixon diamonds)
its defeasible reasoning should be auditable through the temporally evolving evidence world of the clinician-patient
it should guarantee, within a world of hypothesis, that the deductive, inductive and adductive reasoning path can be systematically explored, and can cover the complete set of hypotheses.

Well, what is the space of evidence? Let’s start with a clinical example.

Let the hypothesis H be: This patient is allergic to penicillin.

Notation	Evidence State	Meaning
H⁺	Asserted: data/source affirms H	Documented prior anaphylaxis to penicillin. The evidence establishes the allergy as true.
H⁻	Rejected — source affirms ¬H	Allergy testing negative. Patient later tolerated amoxicillin.
H⁰	Source-silent — no statement (open world)	Allergy field in chart was left blank — does not mean no allergy.
φ	Unresolved — conflicting support, unsettled	One note says allergic, another says patient tolerated it.

If we could record the evidence state of every hypothesis, it would create the right mathematical structure for defeasible reasoning across the three types (deduction, induction, abduction).

Picture a deterministic tool that sits exactly where the agent’s reasoning currently is inscrutable, and instead makes the whole Peircean loop into first-class, inspectable objects:

Hypotheses that are explicit assumptions — things you can name, version, and attack.
Defeaters that carry their type — this evidence rebuts the conclusion and that one undercuts the experiment that produced it — so an anomaly does not collapses into a regression-to-the mean behavior of the agent.
A conflict-resolution temperament you set on purpose — skeptical when stakes are high, credulous when brainstorming — instead of one that drifts with the agent LLM you use.
A hard core and a protective belt that actually exist as structure, as a governance policy that cannot be violated. When an anomalous result arrives you can watch the system decide where it lands — and overrule it when it’s about to quietly bury a result that should have rattled the core.
A system that treats a contradiction correctly: not as noise to be averaged away, but as a signal that the categories need refining — a collision between a default and a fact logged as a constructive event, the thing that makes the hypothesis world better.

The glass box answers the two questions: this hypothesis took the hit, the defeat was an undercut — the confound was in this evidence, not the hypothesis or theory — and here is the one experiment that would put it back. A proper argument.

The details of the data structure we built is a story for another day, and frankly the internals are the least interesting part. The interesting part is the shape of the gap, and the gap is this:

You may know the story of the physicist Pauli. According to Peierls, on seeing a paper of a young physicist Pauli had remarked sadly, It is not even wrong.

Being wrong requires defeasible reasoning. We don’t want our AI scientists and AI medical Assistants to be not even wrong. We really don’t want Pauli to turn in his grave.

I have tried to link most of the papers and links in the post. All underlines are links you can follow.

June 18, 2026

To dose, or not to dose: that is the question
How China’s Investigator-Initiated Pathway Is Rewriting the Validation Trajectory for Cell/Tissue Targeted Medicines

A perfect storm

In September 2025, the New England Journal of Medicine published as Correspondence the first clinical data that would have seemed implausible five years ago¹. Five patients with refractory systemic lupus erythematosus had received an intravenous infusion containing messenger RNA encoding a CD19 chimeric antigen receptor, packaged inside a lipid nanoparticle engineered to deliver its cargo specifically to T cells. The patients’ own T cells, reprogrammed in their own bodies, attacked the B cells driving the disease. Four of the five had lupus nephritis. All showed deep B-cell depletion. None needed the toxic chemotherapy conditioning that conventional CAR-T therapy requires.

The drug was called HN2301. The company was MagicRNA, based in Shenzhen. The trial was an investigator-initiated study (IIT) — meaning the local hospital ethics committee, not China’s national drug regulator, had cleared it. From a regulatory standpoint, it was the kind of trial a clinician runs to test a hypothesis. From a scientific standpoint, it was the first published clinical evidence that you can manufacture CAR-T cells inside a patient’s body.

In March 2025, AstraZeneca had paid up to a billion dollars to acquire a small Belgian biotech called EsoBiotec. The asset that justified the price was a similar in vivo CAR-T concept, ESO-T01 — co-developed with Shenzhen’s Pregene Biopharma — whose first patient had been dosed in November 2024 at Union Hospital in Wuhan, part of the Tongji Medical College system at Huazhong University of Science and Technology, under principal investigator Heng Mei. The trial was multi-center, investigator-initiated, with a planned enrollment of up to 24 patients with relapsed/refractory multiple myeloma. The dosing was announced publicly on January 8, 2025. Two months later, AstraZeneca acquired the company.

In late April 2025, a Shanghai company called YolTech reported interim results from its own investigator-initiated trial of YOLT-101, an in vivo base editor for heterozygous familial hypercholesterolemia, in which the company’s proprietary adenine base editor² — packaged in a GalNAc-conjugated lipid nanoparticle (LNP)— converts a single nucleotide in the PCSK9 gene of hepatocytes. The trial run at Renji Hospital of Shanghai Jiao Tong University, had enrolled six subjects across three dose cohorts. PCSK9 levels fell by more than 70% from baseline in the higher-dose groups. LDL cholesterol reductions were durable through at least 24 weeks of follow-up, with the longest individual reaching 36 weeks. Five weeks after the data readout the U.S. FDA cleared YolTech’s investigational new drug application to run the same study in the United States.

These are not isolated events. The industry is very rapidly evolving. First-in-human (FIH) means something completely different today than what it meant even a year or two back. These events are just the surf from the turbulent waves that has left US biotech reeling. This is quietly becoming the fastest path for a new molecule or a novel therapeutic idea to race to to interpretable human data. To understand why this matters — for patients, for investors, and for anyone designing the next generation of cell-targeted medicines — you have to understand the system that produced them.

Two tracks to First-in-Humans

Since 2017, China has run a dual-track regulatory system for cell and gene therapies. One track is the conventional one: an industry-sponsored Investigational New Drug application reviewed by China’s NMPA. This is the rough equivalent of the U.S. FDA’s Center for Drug Evaluation and Research (CDER). The other track is the Investigator-Initiated Trial, or IIT, overseen by China’s National Health Commission and gated only by an institutional ethics committee at a licensed hospital.

The IIT is not unique to China. The United States has investigator-initiated studies too. What is unique is the combination of three things:
1. The legal status of IITs as a recognized regulatory pathway for novel modalities including cell and gene therapy
2. The scale of the Chinese hospital system willing to run them
3. The data infrastructure to publish and license the products that this system can output.
The financial fingerprint of this system is striking. A analysis published in Frontiers in Pharmacology in early 2026³, identified 10,373 cell therapy clinical trials worldwide.

This table says it all!

Region Cell therapy trials (Oct 2025) Phase III share Early Phase I share
United States 3,563 4.4% 1.7%
China 3,365 1.6% 21.1%
Europe 1,584 10.5% 0.5%

What this means is that China is not running the same trials the West runs. China is running a different kind of trial — earlier, smaller, more exploratory, less expensive — at enormous volume. An earlier analysis of 953 Chinese gene-and-cell therapy trials published in 2022⁴ found that investigator-initiated studies “far exceeded” industry-sponsored ones in every category except in vivo gene therapy, where the regulatory bar is structurally higher.

This produces a peculiar economy. Conventional Western drug development assumes that a Phase I trial is a $5–15 million undertaking that takes 12 to 24 months to launch after a successful IND filing. A Chinese IIT for a cell-targeted construct, run at a major academic hospital, can be initiated in 3 to 9 months at a cost of $300,000 to $1.5 million. The EsoBiotec story is the canonical example: hypothesis, registration, first patient, acquisition — three months from human data to a billion-dollar exit.

A second statistical fingerprint sharpens the picture. The same paper found that 43% of China’s cell-therapy trials use genetically modified cell products, meaning, cells engineered with an inserted transgene, edited gene, or both, rather than cells used in their native state. In the United States and the European Union, the genetically-modified share is closer to 19%. The Western cell-therapy portfolio is more heavily weighted toward unmodified modalities: hematopoietic stem-cell transplants, mesenchymal stem cells, tumor-infiltrating lymphocytes. The Chinese portfolio, by contrast, leans hard into constructed cell therapies — CAR-T, CAR-NK, TCR-T, and the in vivo variants of all three. China runs roughly twice the proportion of engineered-cell trials that the West does. Engineered cell therapies are the modality class whose value is determined by the targeting molecule each cell expresses: the CAR’s antigen-binding domain, the TCR’s recognition sequence, the homing peptide on the LNP that delivers the genetic payload. So China is dominating in therapies whose primarily value is in how precisely they find their cellular target.

The modalities that benefit

The IIT pathway is producing first-in-human data across at least eight different modality classes, each of which depends on a different solution to the same underlying problem: how do you get a payload — a gene editor, a piece of RNA, a radioactive atom, a cytotoxic small molecule — into a specific kind of cell?

In vivo CAR-T

This is the modality that has captured the most attention. After MagicRNA’s work are at least three other clinical-stage programs. Genocury Biotech, also in Shenzhen, reported a complete remission in a patient with refractory diffuse large B-cell lymphoma after a single dose of its in vivo CD19 CAR-T. The trial was run at Tongji Hospital in Wuhan under principal investigator Jia Wei. EsoBiotec’s BCMA program is now an AstraZeneca asset. A fourth program, registered in January 2026 by Daihong Liu of the PLA General Hospital in Beijing uses a polymer-lipid hybrid nanoparticle⁵ to deliver mRNA encoding a dual CD19/CD20 CAR.

In vivo gene editing

YolTech alone now has four programs with human or near-human data. Its base editor for familial hypercholesterolemia, YOLT-101, edits the PCSK9 gene in hepatocytes using a lipid nanoparticle delivery system. Its CRISPR-Cas program YOLT-201 targets the TTR gene in transthyretin amyloid cardiomyopathy. Anyone who has worked in the nucleic acid drug space should be very familiar with these targets! YOLT-203 treats primary hyperoxaluria type 1. YOLT-202, for alpha-1 antitrypsin deficiency, has FDA Regenerative Medicine Advanced Therapy designation. A second Chinese company, AccurEdit, has reported up to 70% LDL-cholesterol reduction from a single dose of its own base-editing therapy. A third, Base Therapeutics in Shanghai, has registered two oncology programs.

What is striking about this cluster is that they are liver focussed where all LNPs accumulate anyway. The clinical successes so far are for liver-expressed targets (PCSK9, TTR, alpha-1 antitrypsin). The frontier — the editing of cells anywhere else in the body — is purely a problem of finding the right targeting ligand.

So US biotech can still innovate. The bottleneck on every in vivo gene editor is the same — targeting to cells and tissues. Wouldn’t it be amazing if we could specifically target the brain, muscle & cardiac tissues, immune system, or any solid tumor? What if we could efficiently design peptides, aptamers, or a polymeric system that can decorate the LNP and redirect it?

In fact, aptamers form a third cluster, dominated almost single-handedly by the laboratory of Weihong Tan, now at the Hangzhou Institute of Medicine. In 2023, the Tan group published the first-in-human pharmacokinetic study of a synthetic DNA aptamer in Research, a Science Partner Journal⁶. The aptamer, called SGC8, was radiolabeled with gallium-68 via a NOTA chelator and injected intravenously into cancer patients at Renji Hospital in Shanghai under hospital ethics committee approval. It bound the cell-surface receptor PTK7. It was, in the most literal sense, a designed targeting molecule visualized inside human bodies.

The Tan lab’s more recent work has taken a different turn. A September 2025 preprint described what the group calls Apt-circRNA: a circular RNA molecule with aptamer sequences embedded directly into its structure. I have written about this in an earlier blog post, see here. The aptamer acts as the targeting moiety for the circular RNA payload. There is no lipid nanoparticle. There is no carrier of any kind. The construct is both the medicine and the address it travels to. In mice, the Apt-circRNA, loaded with tumor antigen, drove antigen presentation in dendritic cells and cleared established tumors.

Radioligand Therapies

This is fourth cluster and arguably has the lowest-friction to get human data for a new targeting polymers. Chinese nuclear medicine departments routinely run investigator-initiated trials of novel peptide ligands labeled with diagnostic or therapeutic isotopes. A paper published in the Journal of Medicinal Chemistry in early 2026 described the first-in-human evaluation of a novel PSMA-targeting radioligand whose key feature was a modified amino acid — a beta-3 amino linker — designed to reduce off-target uptake in the kidneys and salivary glands⁷.

The trial was first-in-human and IIT. An EJNMMI paper from October 2025 described an investigator-initiated dose-escalation trial of a fibroblast-activation-protein-targeted radioligand in patients with advanced sarcoma and other refractory cancers. Another such trial is running at Nanjing First Hospital.

The drug-conjugate families — ADC, peptide-drug conjugates, antibody-oligonucleotide conjugates, radionuclide conjugates, small-molecule drug conjugates, immunostimulatory antibody conjugates, antibody-degrader conjugates — all share the same three-part architecture: a targeting ligand, a linker, an effector. Targets are appearing in Chinese trials before they appear in Western ones.

mRNA and circRNA cancer vaccines form a fifth cluster, where the targeting question is whether the antigen-coding RNA reaches the right antigen-presenting cell. StemiRNA Therapeutics in Shanghai has received CDE approval for SW0715, a lipopolyplex-formulated mRNA encoding IL-12 — a lipid-polymer hybrid carrier. Academic groups at Fudan, Tsinghua, and Mengchao Hepatobiliary are running circRNA neoantigen vaccine programs against hepatocellular carcinoma and HPV-driven cancers.

What unites these clusters is not a shared technology, instead a shared innovaiton bottleneck. In every case the value-creating element of the medicine is the molecule that directs it to its cellular destination. The CAR, the aptamer, the targeting peptide on a radioligand, the antibody on a conjugate, the engineered envelope of a virus, the surface protein of an exosome, the ligand on a lipid nanoparticle. Designing these targeting molecules is the rate-limiting step. Validating them in humans is the value-creating step. And it is the latter that the Chinese IIT pathway has compressed by an order of magnitude.

The mechanics of how it all moves so fast

How does a Chinese IIT actually move so fast? It is worth understanding the mechanics.

Three structural factors do most of the work.

The first is hospital sponsorship. An IIT in China is sponsored by the institution running it — typically the principal investigator and the hospital’s clinical research unit. The investigator submits a protocol to the hospital’s ethics committee, which assesses safety, scientific rationale, and ethical considerations. If the committee approves, the trial can proceed. There is no parallel review by NMPA, no IND filing, no requirement for a sponsor company.

The legal sponsor is the investigator. This eliminates an entire layer of regulatory interaction that, in the United States, typically consumes 12 to 18 months between protocol design and first patient dosed.

The second is the sheer density of the hospital infrastructure. Twenty-five percent of the world’s top 200 research hospitals by Nature Index share are in China. Many of the major academic centers — Renji Hospital, Tongji Hospital, PLA General Hospital, Peking Union Medical College Hospital, Fudan Zhongshan, Nanjing First — have established cell therapy and gene therapy units with experienced clinical research staff, on-site manufacturing capacity, and ethics committees that have evaluated dozens of novel-modality protocols.

The third is cost. Clinical operations in China cost roughly 30 to 40 percent of equivalent operations in the United States — and for early-phase exploratory trials with small patient cohorts and short follow-up, the multiple can stretch further. CRO labor, hospital bed-days, GMP manufacturing, all are systematically less expensive. A small IIT can be funded out of a company’s seed-stage budget. A Phase I IND in the U.S. typically cannot.

China’s NMPA has joined the International Council for Harmonization, which has the effect of aligning Chinese review standards with international ones for trials that do graduate to industry-sponsored status. Beijing announced in April 2025 that it would process investigational drug applications in 30 working days, down from 60. The ecosystem there has been actively engineered to compress timelines, whereas US has fought against mRNA vaccines and tech.

The result is a kind of clinical-development arbitrage that did not exist five years ago. A biotech founder with a novel cell-targeting construct now has a choice. Path A is the conventional one: raise $20–30 million in a Series A, spend two years and another $10–20 million on IND-enabling studies, file an IND with the FDA, recruit U.S. sites, and wait. Path B is the new one: design the construct, find an academic principal investigator at a Chinese hospital who shares the scientific interest, fund a small IIT, and dose your first patient inside of nine months for under a million dollars. The data from Path B will not get you a U.S. approval. But it will tell you, with real human evidence, whether your construct works.

For an investor that distinction changes the entire risk profile. The biggest question in early-stage biotech investing is whether the company’s preclinical model translates to humans. A team who can answer that question with human data — even exploratory, even small — is selling a fundamentally different proposition than a team who can only point to mice. Welcome to the brave new world of drug discovery!

The deal-side perspective

The translation of this scientific reality into commercial activity has been swift, but it has taken two distinct forms — and the difference between them reveals where the industry is heading.

The EsoBiotec story is the cleanest small-scale example. In December 2024 the company initiated an investigator-initiated trial of ESO-T01 at multiple Chinese sites. The first patient was dosed in January 2025. In March 2025 AstraZeneca acquired the company for up to a billion dollars in cash and milestones. The asset that justified the price was not a Phase I dataset in the traditional sense — it was a small, IIT-derived signal that in-vivo CAR-T could work for multiple myeloma. AstraZeneca did not need a U.S.-quality IND package. It needed conviction.

The most consequential transaction of 2025, however, was something larger and structurally different. On July 28, 2025, GSK announced an agreement with Jiangsu Hengrui Pharma — one of China’s largest pharmaceutical companies — to develop up to twelve innovative medicines across respiratory, immunology, inflammation, and oncology. GSK paid $500 million upfront. The total potential value, if all twelve programs are optioned and all milestones met, is approximately $12 billion, plus tiered royalties on global sales outside Greater China. The lead asset is HRS-9821, a PDE3/4 inhibitor in clinical development for chronic obstructive pulmonary disease. The other eleven programs are not yet in the clinic; Hengrui will develop each of them through Phase I, including the recruitment of patients outside China, and GSK holds the option to take any of them global at the end of each Phase I.

The structure of the GSK-Hengrui deal says something the industry has been moving toward but rarely articulates so clearly: a top-five global pharma is treating a Chinese pharma’s discovery and early-clinical engine as a strategic source of pipeline. Not as a one-off vendor of a single asset but a portfolio builder.

The financial structure mirrors the scientific reality: the value-creating step — getting a new molecule into a human and seeing what it does — is being run in China.

A second, parallel transaction in the same year traced a longer arc and made the same point in a different way. In November 2023, BioNTech licensed the ex-China rights to a bispecific antibody called PM8002 from Biotheus, a Zhuhai-based biotech founded in 2018, for $55 million upfront. In November 2024, BioNTech acquired Biotheus outright for $800 million plus $150 million in milestones, gaining global rights to the molecule it had renamed BNT327. In June 2025, BioNTech licensed BNT327 to Bristol Myers Squibb for $1.5 billion upfront, $2 billion in fixed payments through 2028, and up to $7.6 billion in milestones — a total deal value of up to $11.1 billion. BNT327 targets PD-L1 and VEGF-A simultaneously, an approach that early data suggest could outperform Merck’s Keytruda, the world’s best-selling drug. It is now in global Phase III trials for small-cell and non-small-cell lung cancer, with a triple-negative breast cancer trial slated for late 2025 and more than a thousand patients dosed across some twenty studies.

The arithmetic on BNT327 is worth dwelling on. A bispecific antibody developed in a Chinese biotech moved from a $55 million license to a deal worth up to $11.1 billion in less than two years — a more than 200-fold appreciation in stated transaction value.

The story continues. Merck paid up to $3.288 billion to license LM-299, another PD-1/VEGF bispecific, from China’s LaNova Medicines. AbbVie’s $2.1 billion acquisition of Capstan Therapeutics in 2025, for in vivo CAR-T lipid nanoparticle technology, was driven by precisely the modality MagicRNA had just published on. Lilly’s $2.2 billion January 2026 licensing deal with Profluent was for AI-designed CRISPR delivery components — the kind of asset that would naturally be validated in a Chinese IIT before any U.S. registrational study. Moderna signed a deal worth up to $1 billion in May 2025 to establish mRNA manufacturing and trials in China.

This an emergent two-track pattern of pharmaceutical business development that did not exist five years ago.
- Track one is the asset purchase: a Western acquirer pays for a specific Chinese-validated molecule, sometimes through an intermediary
- Track two is the portfolio collaboration: a Western pharma buys access to a Chinese discovery and early-clinical engine across multiple programs simultaneously, with optionality at the Phase I gate. The GSK-Hengrui deal is the canonical example.
Both tracks move capital westward, but the underlying architecture is the same — Chinese clinical validation, Western commercial valuation, and the arbitrage between them as the largest single source of pharma deal-making.

If you map the modified-polymer and cell-targeted therapeutics deals announced from 2023 through early 2026, the total upfront-plus-milestone value crosses $89 billion. In a growing number of cases, the first interpretable human data on that targeting molecule comes from a Chinese hospital, the molecule itself was discovered in a Chinese biotech, or both.

But it need not be so. US Biotech can innovate and still leverage both of these tracks.

So where do we go next as a biotech founder?

If a biotech founder reading this wants to ruminate on possible paths, here we go:
- Identify the principal investigator before the protocol. The most productive entry points are academic clinicians who have already run novel-modality IITs and whose research interests align with the construct in question.
- China has one of the world’s deepest contract manufacturing benches for modified peptides, modified oligonucleotides, and conjugates. Worth considering for material sourcing, GMP etc. I have worked extensively in India for this.
- Scope out the regulatory boundaries. Two things matter here. The first is the Human Genetic Resources rule, updated by NHC in April 2024, which requires approval for any foreign access to Chinese-origin human genetic data. The second is the question of how the resulting IIT data will be used downstream — exploratory signal for the next funding round, supporting data for a U.S. IND filing, or asset-level validation for a strategic partnership. A founder who plans to use IIT data to anchor an acquisition needs to run the trial to a higher standard than one who plans only to use it for internal go/no-go.
- Budget considerations: A focused FIH IIT for a cell-targeted construct, with five to fifteen patients and a defined primary endpoint, can usually be done for between $500,000 and $1.5 million all-in, including manufacturing of the clinical-grade material. A bridging Phase 1 in the United States will cost an order of magnitude more.
- Chinese IIT data is exploratory by FDA and EMA standards. It does not, on its own, substitute for an IND-quality safety dataset for a U.S. filing.
Why this matters for the design of new medicines

The deeper story underneath the regulatory mechanics is a story about what kind of science the system rewards.

For most of the history of biotech, the bottleneck was the drug. You found a target. You discovered a molecule. You spent years and hundreds of millions of dollars proving the molecule was safe and effective. Validation was slow because the molecules were slow to design.

The molecules are becoming faster to design, though chemically modified ones are still very much a challenge. AI, computational chemistry methods for modified-polymer design are catching up fast. The bottleneck has moved upstream of synthesis: it is now the question of which molecules to design, which targets to hit on which cells. The bottleneck has also moved downstream: it is now the question of how quickly you can validate any one design in a human being.

China’s IIT system attacks the downstream bottleneck. It does not solve the upstream one. An innovator who wants to use the system productively still has to know which cell-surface protein to target on which tissue, and still has to design a molecule that will bind it well. But once those choices are made, the system shortens the validation cycle from years to months and from tens of millions of dollars to less than two.

This compression has consequences that ripple outward. It changes the economics of biotech seed financing — a company can credibly aim for human data on a milestone budget. It changes the structure of pharma business development — assets become acquisition candidates earlier in their development, on smaller datasets, demanding that we ask altogether different questions. It changes the pace of competitive dynamics — a Chinese in vivo CAR-T company can publish in NEJM before a U.S. competitor has dosed its first patient.

And it changes which scientific problems are worth attempting in the first place. If validation takes ten years and a hundred million dollars, you choose your targets conservatively. If validation takes a year and a million dollars, you can try things that previously would have been unjustifiable.

This last shift is the most important and the least discussed. The Chinese IIT pathway, applied at scale to cell-targeted medicine, is making it economically rational to design medicines for cellular addresses that no one has ever tried to deliver to before. The map of drugged cell-surface proteins on the human body is small — fewer than a dozen targets across the entire receptorome have been engaged by approved cell-targeted therapeutics. The map of potential targets is enormous, in the low thousands of cell-surface proteins. The arbitrage between those two numbers — between what has been done and what could be done — is what defines the next decade of cell-targeted medicine, and it is the IIT pathway that makes the arbitrage economically practicable.

A measured caveat

There are real risks in this story, and a fair-minded reader should know them.

China’s IIT data is not registrational data, and treating it as such has burned investors before. The quality of IIT trials varies widely: the major academic centers run them to international GCP standards, but smaller hospitals do not always. An innovator building a serious clinical program will need a Western regulatory consultant in the loop from day one, not as an afterthought when the IIT data lands.

The BIOSECURE Act, the U.S. legislation restricting federal contracts with certain Chinese biotech firms, remains in effect through 2026 and creates real friction for companies that intend to repatriate Chinese-developed assets to U.S. commercialization. A company planning to use the Chinese pathway should also plan a non-Chinese manufacturing footprint for any commercially bound asset.

The Human Genetic Resources rule complicates how IIT-derived human data flows into joint development agreements with Western pharma. Companies that fail to scope this early have lost deals over it.

Geopolitical risk is real and difficult to model. The same conditions that make the IIT pathway fast — institutional density, regulatory flexibility, low cost — sit on top of a U.S.-China relationship that is not stable.

The calm after the storm?

If the trajectory of the past three years continues — and there is little structural reason to expect it not to — three things will be visible from the outside by 2027.

First, the headline first-in-human data for the next wave of cell-targeted modalities will increasingly come out of Chinese hospitals.

Second, the deal-making patterns will see seismic shifts. Strategic acquirers will increasingly underwrite assets at the IIT data stage, not at the Phase I-complete stage. The size and shape of biotech Series A and Series B rounds will shift to accommodate this. A company that previously needed $80 million to get to a Phase 1 data readout may now need $25 million to get to IIT data plus a bridging plan. The implications for venture-stage biotech investing are large and under-appreciated.

Third, the choice of which targets to attempt will broaden. The hardest problem in cell-targeted medicine has always been identifying which receptors on which cells are worth the years and dollars of a development cycle. As that cycle shortens, the answer shifts from the same dozen targets everyone else is chasing to any target with a defensible scientific rationale and a designable targeting ligand. Foundation-model approaches to molecule design — for antibodies, for peptides, for aptamers, for chemically modified polymers — pair naturally with this expansion. A platform that can rapidly generate targeting molecules against new cell-surface antigens, combined with rapid lab-in-a-loop and regulatory pathway that can rapidly test them in humans, is a different kind of business than the biotechs of the prior generation.

A note on sources
1. Wang Q, Xiao ZX, Zheng X, Wang G, Zha GF, Schett G, Chen Z, et al. “In Vivo CD19 CAR T-Cell Therapy for Refractory Systemic Lupus Erythematosus.“ New England Journal of Medicine. Published September 17, 2025. ↩︎
2. Wan P, Tang S, Lin D, Lu Y, Long M, Xiao L, Jiang Y, Liao J, Ma X, Liu Y, Yu W, Wang ZJ, Wu Y, Yang T, Xia Q. “Base Editing Gene Therapy for Heterozygous Familial Hypercholesterolemia.” medRxiv 2025.04.17.25325983. Posted April 17, 2025. DOI: https://doi.org/10.1101/2025.04.17.25325983 ↩︎
3. Wang M, Zhou T, Liu S, Xiang W, Xie K, Zhang X, Hu W, Fang M, Zhang Z, Chen M, Wang X, Wu J. “Global Panoramic analysis of clinical research in cell therapy: clinical trial landscape, marketed products, and regulatory trends.” Frontiers in Pharmacology, 9 February 2026. DOI: 10.3389/fphar.2026.1715984. ↩︎
4. Yin C, Gao J, Li G, Hu H, Zhou L, Lu S, Chen X. “Gene and cell therapies in China: booming landscape under dual-track regulation.” J Hematol Oncol. 2022;15(1):139. doi:10.1186/s13045-022-01354-9. PMC: PMC9535931. ↩︎
5. ClinicalTrials.gov. “Polymer-lipid Particle-delivered CAR1920 mRNA CAR-T (InViVoCAR1920) for B-cell Lymphoma/Leukemia.” Identifier: NCT07321301. Sponsor-investigator: Daihong Liu, Chinese PLA General Hospital, Beijing. Registered January 7, 2026. https://clinicaltrials.gov/study/NCT07321301 ↩︎
6. Ding D, Zhao H, Wei D, Yang Q, Yang C, Wang R, Chen Y, Li L, An S, Xia Q, Huang G, Liu J, Xiao Z, Tan W. “The First-in-Human Whole-Body Dynamic Pharmacokinetics Study of Aptamer.” Research (a Science Partner Journal). 2023;6:0126. ↩︎
7. Gao X, Miao Y, Li L, et al. “Synthesis, Evaluation, and First-in-Human Study of a Novel PSMA Radioligand Bearing Beta3-Amino Acid Linkage.” J Med Chem. 2026;69(5):5610-5621. doi:10.1021/acs.jmedchem.5c02821 ↩︎
May 29, 2026

Region	Cell therapy trials (Oct 2025)	Phase III share	Early Phase I share
United States	3,563	4.4%	1.7%
China	3,365	1.6%	21.1%
Europe	1,584	10.5%	0.5%

Building Brains from Polymers: The Quiet Revolution in Organic Neuromorphic Computing

In Altered Carbon the author wrote,

For all that we have done, as a civilization, as individuals, the universe is not stable, and nor is any single thing within it. Stars consume themselves, the universe itself rushes apart, and we ourselves are composed of matter in constant flux. Colonies of cells in temporary alliance, replicating and decaying and housed within, an incandescent cloud of electrical impulse and precariously stacked carbon code memory. This is reality, this is self knowledge, and the perception of it will, of course, make you dizzy.

Of all the colonies of cells, neurons are rather strangest of them all. My first love in all of mathematical biology was the Hodgkin-Huxley model.

Every time you read a book, add a dash of smoked paprika to your sauce, or yell at your kids, your brain performs a computational magic. It processes vast streams of sensory data and makes split-second decisions, all on roughly 20 watts of power. You need less power than a dim light bulb and the end of a seedy alley in a Batman movie.

Modern AI, by contrast, can demand megawatts. A Large Language Model training consumes more electricity than a small town uses in a year. Yeah, you are directly causing global warming by brainstorming with your Claude on how to get away with murder, or escape to Timbuktu.

This staggering mismatch has forced engineers to ask a basic question: what if we stopped trying to simulate the brain on conventional hardware and instead built hardware that works the way human brain actually does?

This is where dreams of neuromorphic computing comes in. And in the last three years, a wave of breakthroughs, many originating from labs in China, has brought us closer to answering it than I had realized. I thought I would write about it and spread my ignorance!

The von Neumann Bottleneck: Why silicon and your grey matter differ

Every conventional computer, from your phone to a data center, is built on the von Neumann architecture proposed in 1945. Its defining feature is a rigid separation between the processor (which computes) and memory (which stores). Data must constantly shuttle back and forth between the two over a shared bus. Obviously this is a mighty fine design. We all know that von Neumann was an alien whose mathematical super-intelligence make your and my brain look like a toaster next to a Voyager. Nevertheless, the von Neumann architecture creates an inherent bottleneck of data flow and data processing.

Your brain has no such bottleneck. Its roughly 10¹² neurons and 10¹⁵ synapses perform processing and storage in the same physical location. roughly. When a synapse strengthens in communication, a process called long-term potentiation, it is simultaneously computing and remembering. There is no bus and the Computation is memory.

This architectural difference has profound consequences. The brain operates at frequencies around 10 Hz — millions of times slower than a modern CPU clock — yet outperforms supercomputers on tasks like real-time sensory integration. It achieves this through massive parallelism and extreme energy efficiency. A single synaptic event consumes on the order of femtojoules.

Neuromorphic engineering aims to close this gap by building physical hardware that mimics these principles.

A summary table ma be useful to anchor it:

Compute system	Energy per synaptic event
Biological synapse	~1–10 femtojoules (10^-15 J)
Best OECT artificial synapse	~femtojoules to picojoules
GPU (floating-point multiply)	~picojoules to nanojoules
Full LLM inference per token	~joules (whole system)

The first serious attempts at neuromorphic hardware used conventional silicon. Projects like IBM’s TrueNorth, Intel’s Loihi, Stanford’s Neurogrid, and the University of Manchester’s SpiNNaker demonstrated that spiking neural networks could be implemented in CMOS circuits, achieving orders-of-magnitude improvements in energy efficiency per event compared to traditional processors.

But silicon neuromorphic chips have a fundamental problem: emulating a single synapse or neuron in CMOS typically requires more than ten transistors. This makes large-scale integration expensive, energy-hungry, and difficult to scale. Moreover, silicon is rigid, brittle, and biologically incompatible. Wouldn’t it be great if we could create direct interfaces with living tissue?

Organic & Polymeric Electronics

Can we make chips out of plastic material? If we could then such flexible electronics would mimic true biological neurons. This have been the dream, and recent advances in n-type polymers brings it much closer to reality¹.

Organic mixed ionic-electronic conductors (OMIECs) are polymers that conduct both electrons and ions simultaneously. This dual conductivity is is the key property that makes them uniquely suited for neuromorphic hardware. Biological neurons and synapses operate through ion flow of sodium, potassium, calcium, and chloride ions crossing membranes through voltage-gated channels. OMIECs can mimic the same electrochemical language.

The ionic conductance story of a neuronal spike, roughly

The organic electrochemical transistor (OECT) — a three-terminal device built from these materials — has emerged as the workhorse of organic neuromorphic electronics. A typical OECT consists of a polymer channel (the source-drain path), an electrolyte, and a gate electrode. When a voltage is applied to the gate, ions from the electrolyte migrate into the polymer channel, doping or de-doping it and changing its conductivity. The channel itself acts as an artificial synaptic cleft; the gate electrode mimics the presynaptic membrane; the drain collects the postsynaptic current.

The physics of this process can be described as follows. In the Bernards-Malliaras model, the drain current $I_{DS}$ in the linear regime follows²:

I_{DS} = \mu C^* \frac{W T}{L} \left( V_{TH} – V_{GS} + \frac{V_{DS}}{2} \right) V_{DS}

where $\mu$ is the carrier mobility, $C^*$ is the volumetric capacitance of the channel material, $W, T$ , and $L$ are channel width, thickness, and length, and $V_{TH}$ is the threshold voltage. It is the firing threshold equivalent of a neuron.

$V_{GS}$ is the gate-to-source voltage. It controls ion injection. This is the input knob: positive $V_{G S}$ pushes cations into the channel de-doping it. Compared to biological neurons, this is the presynaptic signal and equivalent to the neurotransmitter release!

$V_{DS}$ is the drain-to-source voltage. It drives the current along the channel. It pulls carriers from source to drain and acts as as the resting membrane potential.

In contrast, a regular MOSFET has a gate insulator which is a thin oxide. Charges pile up at the surface of the semiconductor, right at the oxide interface. The gate controls a 2D sheet of charge. This is interfacial doping.

An OECT is fundamentally different. There’s no insulator. The gate touches an electrolyte (salt water, a gel, a hydrogel), which touches the polymer channel directly. When you apply a gate voltage, ions physically migrate into the bulk of the polymer, doping or de-doping the entire volume. This is volumetric doping, which gives OECTs their hallmark properties: exceptionally high transconductance, low operating voltages (often below 1V), and a natural capacity for analog, multi-level conductance states.

A key figure of merit for OECT channel materials is the product $\mu C^*$ , which captures the combined electronic and ionic performance. It’s the carrier mobility times the volumetric capacitance.

In neuron terms: $μ C^{*}$ measures how responsive the material is:

Can ions get in easily? ( $C^{*}$ ) — analogous to how many ion channels a membrane has
Can the electrical consequence propagate quickly? ( $μ$ ) — analogous to how well-myelinated the axon is

And here is the recent advancement no one is talking about. Early polymeric materials like PEDOT:PSS achieved values around 50 $F \, cm^{-1} V^{-1} s^{-1}$ . Recent glycolated polythiophenes like p(g3T2-Te) have pushed this to 483 $F \, cm^{-1} V^{-1} s^{-1}$ , a nearly tenfold improvement in under a decade.

Artificial synapses

The most fundamental requirement for any neuromorphic device is synaptic plasticity. This is the ability to strengthen or weaken a connection based on activity, the physical basis of learning and memory.

In biological synapses, plasticity comes in two flavors. Short-term plasticity (STP) lasts milliseconds to seconds and reflects transient neurotransmitter dynamics. Long-term plasticity (LTP) persists for hours or longer and involves structural changes at the synapse — gene expression, protein synthesis, physical remodeling of dendritic spines.

OECT-based artificial synapses replicate both. STP arises naturally from the slow kinetics of ion injection and extraction: apply a voltage pulse to the gate, and ions dope the channel, changing its conductance. Remove the pulse, and the ions gradually diffuse back. The conductance decays to baseline over seconds. This is volatile memory, analogous to a fleeting sensory impression.

Achieving LTP is harder and more interesting. Several strategies have emerged:

Electropolymerization. In a fascinating device reported by Gerasimov et al.³ an OECT built from the polymer ETE-S exhibited evolvable synaptic behavior. They showed that ow gate voltages produced conventional STP through reversible electrochemical doping. Interestingly, sustained moderate voltages instead triggered electropolymerization of additional monomer within the channel, permanently altering its conductance. The synapse literally grew new conductive pathways during operation, which is strong parallel to biological synaptogenesis!
Ion-blocking layers: Inserting physical barriers — metal-organic frameworks, lipid bilayers, nanofibrous polymer coatings — between the channel and electrolyte slows ion diffusion and traps charges in place after programming. A PEDOT:PSS/PEI architecture achieved non-volatile retention exceeding 25 hours, with the electrostatic potential barrier between oxidation states preventing spontaneous discharge⁴.
Crystallinity engineering⁵: Wang et al. built a vertical OECT whose channel has both crystalline and amorphous regions. Low gate voltages dope only the disordered amorphous zones — ions slip in easily but slip out just as fast. Volatile, achieveing STP. Crank the voltage higher and ions force their way into the tightly packed crystalline domains, where they get stuck. Non-volatile, achieving LTP. One device, two memory modes, selected purely by how hard you push. It stores 1,024 distinct conductance states and holds them for over 10,000 seconds; a 10-bit analog memory in a single transistor!

Artificial Neurons: Making Plastics Spike

Building artificial neurons from organic materials is not easy. We need to replicate the complex dynamics of biological action potentials!

The dominant circuit architecture is the leaky integrate-and-fire (LIF) model, implemented using complementary OECTs arranged as an Axon-Hillock (A-H) circuit. A membrane capacitance integrates input current. When the accumulated voltage crosses a threshold, a complementary inverter (built from paired p-type and n-type OECTs) fires a sharp output spike. A reset transistor then discharges the capacitor, restoring the resting state. The cycle repeats.

Early OECT neurons based on this design operated at frequencies below 2.4 Hz, which is far too slow to match even the slowest biological neurons. The breakthrough came with polymer engineering.

In a 2025 study published in PNAS, Yao et al.⁶ introduced Homo-gDPPTz, a new n-type OMIEC designed to closely match the performance of its p-type complement (gDPP-g2T) in vertical OECT complementary inverters. The resulting organic electrochemical neuron achieved a firing frequency spanning 0.13 to 147.1 Hz that can be calibrated — a range that covers the spectrum from slow vasoconstrictor neurons (below 1 Hz) to fast-spiking cortical neurons (over 100 Hz). This is more than 50 times broader than any previous OECT neuron circuit. The device footprint was less than 37 $mm^2$ , and energy consumption was just 4.7 nanojoules per spike.

The neuron was then integrated with pressure and strain sensors to build a complete neuromorphic perception system. Mechanical stimuli from a conductive foam sensor or a printed strain sensor were converted into input currents, which the OECT neuron transformed into frequency-modulated spike trains. These spikes were fed into an artificial synapse (also a vOECT), which modulated its postsynaptic current based on spike frequency, demonstrating the full sensing-encoding-processing loop of biological neural perception. Simulations of a spiking neural network built on these device parameters achieved 96% accuracy on handwritten digit classification.

An even more biologically faithful architecture was recently built using the conductance-based OECN (c-OECN). This emulated sodium and potassium ion channels, reproducing 15 of 20 known neuronal features including depolarization, repolarization, hyperpolarization, and threshold-dependent firing⁷. This design was coupled to a Na⁺ sensor and used to trigger vagus nerve stimulation in mice; a fascinating demonstration of closed-loop physiological regulation driven by an organic artificial neuron!

The Chinese Contribution: Printing Brains on Plastic

Just as Chinese researchers have made rapid progress in efficient and open-source LLMs, they have also made astounding progress in polymeric materials and engineering of neuromorphic computing.

Regionally controlled ion doping⁸. Li, Zhang et al. demonstrated an elegant strategy for building computing and memory on the same chip using identical materials. By controlling the thickness of inkjet-printed hydrogel electrolytes — thick (multi-layer) for computing units, thin (single-layer) for memory units — they created two functionally distinct OECT types from the same PEDOT:PSS channel material. The ion-rich OECT (thick electrolyte) features rapid ion transport, sub-millisecond response, and high transconductance, serving as a computing neuron. The ion-deficient OECT (thin electrolyte) has slow ion dynamics, wide hysteresis, and 300-second state retention, serving as a memory element.

Wow! Integrating volatile computing elements and non-volatile memory elements typically requires different materials, different fabrication processes, and complex circuit architectures. But this work shows that the spatial distribution of the electrolyte alone is sufficient.

Stretchable neuromorphic chips: Dai et al.⁹ created the first intrinsically stretchable neuromorphic device using the polymer p(gT2), an organo-hydrogel electrolyte, and vertically grown gold nanowire electrodes embedded in PDMS. The device maintained over 800 distinct conductance states, switching endurance exceeding 10⁸ cycles, and state retention over 10⁴ seconds — all while being stretched to 100% strain. A 3-by-3 prototype array performed vector-matrix multiplication (the fundamental operation of neural networks) on skin, and simulated classification of ECG signals into five cardiac categories maintained approximately 90% accuracy even when the hardware was physically deformed during inference. This is the first demonstration that neuromorphic computation and mechanical stretchability can coexist without performance compromise — a prerequisite for truly wearable brain-machine interfaces.

Why Organic polymers? The Case for Wetware

There are so many advantages of organic neuromorphic electronics over silicon alternatives:

Biocompatibility. OECTs operate in aqueous electrolytes at sub-volt potentials, conditions compatible with living cells and tissues. Silicon neuromorphic chips require high voltages and rigid encapsulation that damages biological interfaces. Organic devices can directly interface with neurons, epithelial cells, and biological fluids. A PEDOT:PSS OECT has been used to record dopamine release from PC-12 cells in real time, with the neurotransmitter itself modulating the synaptic weight — a true biohybrid synapse¹⁰.

Multimodal sensing. Because the gate of an OECT can be functionalized with enzymes, antibodies, aptamers, or chemically sensitive materials, the same device can respond to electrical, chemical, mechanical, and optical inputs. A single OECT has been shown to simultaneously process pressure, light, and neurotransmitter signals, performing multisensory integration in hardware — something that requires complex multi-chip architectures in silicon.

Energy efficiency. OECT synapses operate at femtojoule to picojoule energy per event, comparable to biological synapses.

Fabrication. Organic semiconductors can be printed using inkjet, screen printing, or aerosol jet deposition on flexible plastic substrates at room temperature. No cleanroom required. No vacuum deposition. No billion-dollar fab. This matters enormously for the envisioned applications: disposable biosensors, on-skin health monitors, and large-area flexible electronics.

The hard problems to solve that fascinates me

Despite the rapid progress, organic neuromorphic electronics face real challenges.

Speed. The ionic transport that gives OECTs their neuromorphic properties is inherently slower than electronic switching. The fastest OECT synapses operate at around 200 nanoseconds; the fastest neurons reach approximately 500 Hz. This is sufficient for biological interfaces but too slow for the megahertz-and-above clock rates needed for general computing. Organic neuromorphic hardware is not going to replace GPUs and may be they need not. There are so many amazing niche applications where speed requirements align with biological timescales.

Stability. Organic materials degrade. PEDOT:PSS is sensitive to humidity. Many OMIECs swell excessively in aqueous environments. Long-term drift in conductance states undermines the reliability of analog memory. Cross-linking, encapsulation, and materials engineering (hydrophobic side chains, ladder polymers) are improving stability, but shelf lives of months, not decades, remain the norm. is that bad? Well, not if they are cheap to manufacture and replace.

Scale. The largest demonstrated organic neuromorphic arrays are still small. Achieving the thousands or millions of devices needed for practical neural networks requires advances in high-resolution patterning, device-to-device uniformity, and interconnect engineering. The inkjet printing approach is promising for this reason.

Framework. The neuromorphic devices being built don’t map cleanly onto existing machine learning frameworks. The non-linearity and asymmetry of analog weight updates, the stochastic variability between devices, and the different timescales of volatile and non-volatile states all require rethinking algorithms from the ground up.

So where are we heading?

In the near-term future we don’t expect polymers to replace silicon. However, there is potentially a more transformative future of intelligent interfaces between the digital and biological worlds!

Imagine a patch on your skin that continuously monitors your ECG, classifies arrhythmias in real time using on-device neuromorphic inference, and communicates only abnormalities — consuming nanowatts, conforming to your body’s movements, never needing to stream raw data to the cloud. Imagine it tells you about your stress level, biomarkers for cardiovascular health, all unobtrusively.

Imagine neural implants that record brain activity and process it locally — filtering noise, detecting seizure precursors, triggering responsive neurostimulation — all using devices that speak the same ionic language as the neurons they interface with!

Imagine artificial skin for prosthetic limbs that senses pressure, texture, and temperature, converts these stimuli into frequency-coded spike trains, and transmits them to peripheral nerves in a format the nervous system can natively interpret.

These applications don’t require faster-than-silicon speed. They require biocompatibility, energy efficiency, mechanical flexibility, and the ability to process sensory information the way biology does — all areas where organic neuromorphic electronics have a structural advantage.

The brain didn’t evolve to maximize clock speed. It evolved to survive in a noisy, unpredictable, energy-constrained environment by integrating sensing, processing, and memory into a single adaptive substrate. For the first time, we’re building electronic systems on the same design principles — using materials that bend, stretch, and operate in the wet, salty, ion-rich environment of the living body.

The future of polymer electronics isn’t about replacing silicon. It is perhaps about performing wonders silicon can’t.

I lay still for a while, picking up the scattered garments of my mind and trying to assemble some kind of reasonable outfit from them.

~Altered Carbon

References

Li, Qifan, et al. A water-processable n-type polymeric ink with conductivities exceeding 1,000 S cm^-1, Matter (2026). ↩︎
Xiang, K., Song, J., Liu, H., Chen, J. & Yan, F. Organic Electrochemical Transistors for Neuromorphic Devices and Applications, Adv. Mater. 38, e15532 (2026). ↩︎
Gerasimov, Jennifer Y., et al. An evolvable organic electrochemical transistor for neuromorphic applications. Advanced Science 6.7 (2019): 1801339. ↩︎
Van De Burgt, Yoeri, et al. A non-volatile organic electrochemical device as a low-voltage artificial synapse for neuromorphic computing. Nature materials 16.4 (2017): 414-418. ↩︎
Wang, Shijie, et al. An organic electrochemical transistor for multi-modal sensing, memory and processing. Nature Electronics 6.4 (2023): 281-291. ↩︎
Yao, Y., Pankow, R. M., Huang, W. et al. An organic electrochemical neuron for a neuromorphic perception system, PNAS, 122, e2414879122 (2025). ↩︎
Harikesh, Padinhare Cholakkal, et al. Ion-tunable antiambipolarity in mixed ion–electron conducting polymers enables biorealistic organic electrochemical neurons. Nature materials 22.2 (2023): 242-248. ↩︎
Li, M., Zhang, W. et al. Regionally controlled ion-doping of organic electrochemical transistors for computing-memory co-integrated neuromorphic systems. NPJ Flex. Electronics,10, 11 (2026). ↩︎
Dai, S. et al. Intrinsically stretchable neuromorphic devices for on-body processing of health data with artificial intelligence, Matter, 5, 3375-3390 (2022). ↩︎
Keene, Scott T., et al. A biohybrid synapse with neurotransmitter-mediated plasticity. Nature Materials 19.9 (2020): 969-973. ↩︎

May 23, 2026

What if we could predict real-world properties of polymeric molecules from their chemical sequences alone?
Polymeric molecules are all around is and in us. It is hardly surprising that a large fraction of life’s molecules carrying information are polymeric, from DNA, RNA to proteins, lipids and peptides.

During my PhD I fell in love with polymers. (I had started my Phd work in Quantum Information but would quickly switched to soft-matter physics). I worked on Vulcanization Transition, a second-order phase transition in which a random melt of polymers, like natural rubber, can be chemically cross-linked to form random solids. I later became fascinated by gels, glassy solids, and the deep connections of their physics to percolation theory, random-resistor networks and jamming transition.

Over the years, I met another fascinating polymer called oligonucleotide: bits of RNA, double-stranded or single stranded (shRNA & siRNA) and eventually bits of DNA (Anti-Sense Oligonucleotides, or ASOs). Oligonucleotides are informational drugs. They carry the genetic information they are destined to modulate.

We all witness the impact of another such informational medicine during Covid-19, the synthetic mRNA polymer creating the right fragment of a protein to vaccinate. If you think about it, 3 of the 4 medicine modalities are polymers: peptides, antibodies, nucleic acids. Small molecules are the only exception; they carry nebulous information lacking focus and interact with almost everything.

Yet, we understand so much and so little about polymers! When I cofounded Creyon, my dream was to engineer one kind of polymer really well: oligonucleotides. These are bits of nucleic acids that are chemically modified to make them drug-like (functionalization), that can be sent to a cell or tissue and precisely control gene expression! (Isn’t it marvelous that A, C, G, T code, a quad, instead of a bit, could do that? It could manipulate the very information in genes that I need to even see this screen?) These functionalizations—chemical modifications of the base, linker of sugar unit of the nucleic acid— could fundamentally change their biological, physical, biochemical properties. They could make these polymers more or less viscous, soluble, serum stable, immunotoxic, bioavailable; sometimes modulating pharmacology measures across four orders of magnitude by a single modification on the same base sequence! We were engineering these molecules & manipulating the information in the informational drug across several axis. We learned how to make the information allele-selective, well-tolerated, have higher affinity, have higher on-rate or activity, and so on.

Lately, I have expanded the scope of that lifelong dream of controlling information flow. The scope is not just human biology and disease, but what more can sequences do and how well can we create sequences? Obviously, a lot of changed in the last 2 yrs! As society we have marveled at what AI can do when fed a large corpus of textual sequences. Who knew LLMs could get this good at writing not just text sequences but logical sequences of codes?

Polymers are just chemical sequences.

Well, one challenge is data. Where are all the data to learn the properties of molecules? Molecules inhabit a very special world. Unlike textual sequences where correlations are hard to quantify but easy to sense, correlations in molecules follow the laws of quantum physics, easy to validate and quantify, but hard to sense by intuition.

The bad news is we need to create these physics-faithful datasets. But the good news is the correlations are nearsighted, as Walter Kohn called it Quantum Nearsightedness.

We started dreaming that we should be able to predict physical properties, like conformations, free energy etc. purely from chemical sequence of polymers. As with any dream, you need good partners in crime! David Pekker Todd Martinez

We tried to take the simplest first step.

Can we predict the thermal ensemble of polymer conformations from their sequence alone?

Well, we asked ourselves, what is a realistic system that will stress test this unreasonable dream? We tinkered with some internal data, but settled on a large dataset on MD trajectories of peptides that was freely available (mdCATH dataset). The trajectory sampling in this data is almost certainly not ergodic, but hey, beggers can’t be choosers, right? Do you have 1-5 Million GPU hours to spare? If it were fully ergodic, we would have gotten very close to computing free energy of peptides directly from sequence. Wild, right?

What we discovered, once we figured out a few critical things in how to include the physics the right way into Diffusion Transformers, that we were able to predict the conformation ensemble as a function of temperature. We did some other work internally to convince ourselves we could do this for other systems too, like for concentration dependence.

So why care?

Turns out, properties of polymers are driven by their conformations and free energy. Ask any peptide chemist and she will tell you that controlling the degrees of freedom (by macrocyclization) is what with you do once you have a lead molecule to stare at and a glass of wine to place some educated chemical bets. Ask a nucleic acid chemist, and she will tell you that a blessed hairpin structure is the reason that an aptamer is a molecular beacon.

But here is the inconvenient truth. Oligo-length (meaning ~10-100 monomer long) polymers (peptides included) are very often highly flexible, and it makes no sense to anchor your expectations of their properties on a single low-energy conformation. Larger proteins are probably a bit different; some of them are folded by chaperones, and it makes sense to use an AlphaFold/ESM/SimpleFold predicted single or closely related structure.

So what next? Well, if we can predict physical properties from sequences, I think an analogy is worth entertaining:

If LLMs understand text and we are increasingly fasciated by teaching LLMs Physical world (Newtonian) what does it take for a Molecular AI model like ours to understand the Quantum World of molecules? How much data? What kind of “sensors” are analogous to the Physical AI sensors and cameras?

And most importantly, were is the limits of molecular engineering? Will you laugh at me if we dream about predicting viscosity? Conductivity? If we engineer the perfect conductive polymer using such generative tools? The perfect tissue-targeting molecule? The perfect precision medicine, ready to be printed?

Read the paper and criticize. We are just getting warmed up! Send your comments!
- https://arxiv.org/abs/2604.14241: Polyformer: a generative framework for thermodynamic modeling of polymeric molecules
May 13, 2026
Tensor Product Attention: Curiosities abound
A recent paper Tensor Product Attention Is All You Need¹ grabbed my attention. Over the last year, I have been exploring and investigating ways to reinterpret attention mechanism, mainly for my own edification. What correlations do a transformer really capture? And unsurprisingly, I have been looking at using intuition from the physics of correlated systems.

Firstly, attention mechanism is often written in a mathematically confusing and redundant way in the machine learning literature. The notation is often obfuscated by implementation quirks of matrix multiplications on GPUs. So let’s set up the notation, and simplify.

In the notes below, I will ignore position encoding. RoPE or learnable additive position encodings do not change the foundational mathematical intuitions I am trying to convey here — it is a distraction.

I use $\ell$ for layer index and $h$ for head index.

The key quantity is the residual stream, $X^\ell$ . This matrix is getting transformed by attention and MLP blocks. The embedding dimension $d_\textrm{model}$ is the size of the vector space in which tokens are being embedded.

We need a few other matrices to really explain what’s going on.

Note that in ML/ AI papers the Query, Value and Key matrices are always written separately, but in essence, we are low-rank decomposing (as product of rectangular matrices) two matrices, $\mathbf{W}_{QK}^{\ell,h} \, \, , \mathbf{W}_{OV}^{\ell,h}$ . This will be clear when we write attention is terms of these matrices —

$\begin{aligned} \text{Attn}^{\ell}(\mathbf{x}_i) = \sum_{h=1}^{H} \sum_{j=1}^{n} \left[ \text{softmax}_j \left( \frac{\mathbf{x}_i^\top \mathbf{W}_{\text{QK}}^{\ell,h} \mathbf{x}_{j}}{\sqrt{d_{\text{head}}}} \right) \right] \mathbf{W}_{\text{OV}}^{\ell,h} \mathbf{x}_j \end{aligned}$

The attention operator $\textrm{Attn}^\ell$ at layer $\ell$ is a sum over individual attention heads, $h$ , with $H$ total heads. Note, here I choose to call the operator the net function that returns a vector of same size as $\mathbf{x}_i$ — one can choose to add this back to the residual $X^\ell$ . Some architectures do so, others send it through the MLP operator. There are a lot of different transformer architectures out there in the various LLMs, and for the purpose of this discussion, it’s unimportant. Moreover, the papers have a bewildering range of definitions of what part of is called attention, which is why I bored you with setting up notation. You are welcome.

Note that the number of heads and head dimensions are chosen such that we always have $d_{\text{model}} \times d_{\text{model}}$ matrices in the above expression.

The only correlation between tokens explored in an transformer is pairwise. The MLP operator acts on the per-token embedding $\mathbf{x}_i$ and do not mix $\mathbf{x}_i$ and $\mathbf{x}_j$ . In the Attention operator $\textrm{softmax}_j$ term is a normalized weight — and every other token embedding $\mathbf{x}_j$ in the context window is getting summed over by this weight multiple by a linear transformation matrix. It is really quite simple.

Well, one may wonder — why only pairwise correlations? And, why only the above functional form for pairwise correlations?

A digression — for physicists like me, any time we see pairwise correlations, we think about Potts model, a generalization of the Ising Model which is perhaps better known. In the q-state Potts model the “spins” are unit vectors that point in q symmetric directions of a hypertetrahedron in q-1 dimensions, see here². In the classical Potts model these vectors interact only if their “spins” (state) are the same.

Can we draw an analogy with Potts Model? Yes, of course! Well, a paper³ already did a version of it—with a Potts Model where the interactions are not restricted to same “spins” but mix “spins”. It’s an enticing direction to study the dynamics of transformers using such mappings.

OK, end of digression.

The Memory Bottleneck in Modern Transformers

Large language models face a critical scalability challenge: the Key-Value (KV) cache. During autoregressive generation, standard Multi-Head Attention (MHA) stores keys and values for all previously generated tokens, consuming memory that grows linearly with sequence length:

$\text{Memory}_{\text{MHA}} \sim n \times H \times d_\text{head}$

See table to to recall notation. For a model with $H = 32$ and $d_\text{head} = 128$ processing a $n = 10^5$ token context, this amounts to over 800MB just for the KV cache of a single layer!

The fundamental question is whether we must store the full $H \times d_\text{head}$ representation for each token, or whether a more compact factorized representation can capture the essential structure with minimal information loss.

Tensor Decompositions: A Primer

Before diving into Tensor Product Attention (TPA), we need to understand the landscape of tensor decomposition methods. A tensor is simply a multi-dimensional array—scalars are 0-order tensors, vectors are 1st-order, matrices are 2nd-order, and so on.

CP Decomposition (CANDECOMP/PARAFAC)

The most common Tensor Decomposition is probably the CP decomposition.

Definition (CP Decomposition): A third-order tensor $\mathcal{X} \in \mathbb{R}^{I \times J \times K}$ has a rank- $R$ CP decomposition if it can be written as:

$\mathcal{X} = \sum_{r=1}^{R} \mathbf{a}_r \circ \mathbf{b}_r \circ \mathbf{c}_r$ where $\mathbf{a}_r \in \mathbb{R}^I$ , $\mathbf{b}_r \in \mathbb{R}^J$ , $\mathbf{c}_r \in \mathbb{R}^K$ and $\circ$ denotes the outer product.

Element wise, Equivalently, for indices $i,j,k$ :

$\mathcal{X}_{ijk} = \sum_{r=1}^{R} a_{ir} b_{jr} c_{kr}$

The CP decomposition represents a tensor as a sum of rank-1 tensors (outer products of vectors). This is the natural generalization of matrix SVD to higher orders, though unlike SVD, computing the optimal CP decomposition is NP-hard. Yeah, sucks, right?

Tucker Decomposition

Another popular tensor decomposition method is the Tucker Decomposition.

Definition (Tucker Decomposition): A Tucker decomposition factorizes a tensor into a core tensor $\mathcal{G} \in \mathbb{R}^{R_1 \times R_2 \times R_3}$ and factor matrices along each mode: $\mathcal{X} = \mathcal{G} \times_1 \mathbf{A} \times_2 \mathbf{B} \times_3 \mathbf{C}$ where $\mathbf{A} \in \mathbb{R}^{I \times R_1}$ , $\mathbf{B} \in \mathbb{R}^{J \times R_2}$ , $\mathbf{C} \in \mathbb{R}^{K \times R_3}$ and $\times_n$ denotes the mode- $n$ product.

More directly, the decomposition is —

$\mathcal{X}_{p q r} = \sum_{i}^{R_1} \sum_{j}^{R_2} \sum_{k}^{R_3}\mathcal{G}_{i j k}\, \mathbf{A}_{pi} \,\mathbf{B}_{qj} \mathbf{C}_{rk}$

The Tucker decomposition generalizes CP by allowing a dense core tensor. Note that the the sizes $R_1, R_2, R_3$ is obviously within the sizes $I, J, K$ of the tensor dimensions— a common choice is $R_1 = R_2 = R_3 = \text{min} ( I, J, K)$ . When tensor $\mathcal{G}$ is super-diagonal (non-zero only when all indices are equal), Tucker reduces to CP.

Tensor Train Decomposition

The tensor decomposition most familiar to physicists is probably the tensor train decomposition.

Definition (Tensor Train): A tensor train (TT) or Matrix Product State (MPS) represents a $d$ -dimensional tensor as a product of matrices —

$\mathcal{X}_{i_1, i_2, \ldots, i_d} = \mathbf{G}^{[1]}_{i_1} \mathbf{G}^{[2]}_{i_2} \cdots \mathbf{G}^{[d]}_{i_d}$

where $\mathbf{G}^{[k]}_{i_k} \in \mathbb{R}^{r_{k-1} \times r_k}$ with $r_0 = r_d = 1$ . The parameters $\{r_1, \ldots, r_k, \ldots, r_{d-1}\}$ are called bond dimensions or TT-ranks.

This is the same structure used to represent quantum many-body states in physics.

Tensor Product Attention: The Core Claim

Now we arrive at the key contribution of the TPA paper. Instead of storing full query, key, and value matrices, TPA represents them using contextual low-rank factorizations.

Standard Multi-head Attention

For token $i$ with embedding $\mathbf{x}_i$ , layer $\ell$ and head $h \in \{ 1, \dots, H \}$

$\begin{align} \mathbf{q}_i^{\ell,h} = \mathbf{W}_Q^{\ell,h} \mathbf{x}_i \in \mathbb{R}^{d_{\text{head}}} \\ \mathbf{k}_i^{\ell,h} = \mathbf{W}_K^{\ell,h} \mathbf{x}_i \in \mathbb{R}^{d_{\text{head}}} \\ \mathbf{v}_i^{\ell,h} = \mathbf{W}_V^{\ell,h} \mathbf{x}_t \in \mathbb{R}^{d_{\text{head}}} \end{align}$

We can stack all the heads into matrices, note that now the matrices are not just weights, but weights multiplied by embeddings—

$\begin{equation} \mathbf{Q}_i = \begin{bmatrix} \mathbf{q}_i^1 \\ \mathbf{q}_i^2 \\ \vdots \\ \mathbf{q}_i^H \end{bmatrix} \in \mathbb{R}^{H \times d_{\text{head}}} \end{equation}$

TPA

TPA factorizes the stacked query/key/value matrices as rank- $R$ sums of outer products.

$\begin{equation} \mathbf{Q}_i = \frac{1}{R_Q} \sum_{r=1}^{R_Q} \mathbf{a}^Q_r(\mathbf{x}_i) \otimes \mathbf{b}^Q_r(\mathbf{x}_i) \in \mathbb{R}^{H \times d_{\text{head}}} \end{equation}$

Note that the dimensions work out, for clarity —

$\begin{align} \mathbf{x}_i \in \mathbb{R}^{d_{\text{model}}} \quad \text{(input)} \\ \mathbf{W}^{a,Q}_r \mathbf{x}_i = \mathbf{a}^Q_r(\mathbf{x}_t) \in \mathbb{R}^{H} \quad \text{(head factor)} \\ \mathbf{W}^{b,Q}_r \mathbf{x}_i = \mathbf{b}^Q_r(\mathbf{x}_t) \in \mathbb{R}^{d_{\text{head}}} \quad \text{(feature factor)} \\ \mathbf{a}^Q_r \otimes \mathbf{b}^Q_r = \mathbb{R}^{H \times d_{\text{head}}} \quad \text{(outer product)} \\ \frac{1}{R_Q}\sum_{r=1}^{R_Q} \mathbf{a}^Q_r \otimes \mathbf{b}^Q_r \, = \mathbf{Q}_i \in \mathbb{R}^{H \times d_{\text{head}}} \quad \checkmark \end{align}$

So for standard MHA, each head independently projects the input—

$\begin{equation} \mathbf{q}_i^h = \mathbf{W}_Q^h \mathbf{x}_i \end{equation}$

whereas for TPA, all heads share $R_Q$ feature vectors, weighted differently per head,

$\begin{equation} \mathbf{q}_i^h = \frac{1}{R_Q} \sum_{r=1}^{R_Q} \underbrace{[\mathbf{a}^Q_r(\mathbf{x}_i)]_h}_{\text{head-specific weight}} \cdot \underbrace{\mathbf{b}^Q_r(\mathbf{x}_i)}_{\text{shared feature vector}} \end{equation}$

The Key Idea: Instead of H independent $d_\text{head}$ -dimensional vectors (one per head), TPA uses—
- $R_Q$ shared feature vectors $\mathbf{b}^Q_r \in \mathbb{R}^{d_{\text{head}}}$
- $R_Q$ weight vectors $\mathbf{a}^Q_r \in \mathbb{R}^H$ — one scalar per head, determining how much each head uses each feature
where $R_Q \ll H$ , therefore leading to parameter efficiency. Obviously, we have similar things going on for $\mathbf{K}_i$ and $\mathbf{V}_i$ .

Parameter counts

For MHA, we total number of parameters for queries only (similar for Keys and Values) are $H \times d_\text{head} \times d_\text{model} = d^2_\text{model}$

For TPA we have—
- Head factors: $R_Q$ matrices of size $H \times d_\text{model}$
- Feature factors: $R_Q$ matrices of size $d_\text{head} \times d_\text{model}$
- Total parameters— $R_Q (H + d_\text{head} ) d_\text{model}$
Example with typical paper values: $H=32$ , $d_{\text{head}}=128$ , $d_{\text{model}}=4096$ , $\boxed{R_Q=6}$ :
- MHA: $32 \times 128 \times 4096 = 16{,}777{,}216$ parameters
- TPA: $6 \times 4096 \times (32 + 128) = 3{,}932{,}160$ parameters
- TPA uses ~23% of MHA’s parameters
Note: Unlike LoRA which factorizes weights, TPA factorizes activations. This means the factorization is contextual—it depends on the input token $\mathbf{x}_i$ . It’s a very interesting idea in how to capture input-dependent structure while maintaining compression!

Memory Reduction

The major advantage claimed by the paper is the memory saving in KV cache. My interest in this paper is beyond this, to study other forms of attention, but it’s useful to note the memory arguments.

From standard MHA we have—
- Store $\mathbf{K}_i \in \mathbb{R}^{H \times d_{\text{head}}}$ and $\mathbf{V}_i \in \mathbb{R}^{H \times d_{\text{head}}}$
- Total: $2 \times H \times d_{\text{head}} = 2d_{\text{model}}$
TPA stores only the factors—
- Store $\{\mathbf{a}^K_r(\mathbf{x}_i)\}_{r=1}^{R_K}$ and $\{\mathbf{b}^K_r(\mathbf{x}_i)\}_{r=1}^{R_K}$ for keys
- Store $\{\mathbf{a}^V_r(\mathbf{x}_i)\}_{r=1}^{R_V}$ and $\{\mathbf{b}^V_r(\mathbf{x}_i)\}_{r=1}^{R_V}$ for values
- Total: $(R_K + R_V)(H + d_{\text{head}})$
The compression ratio is

$\rho = \frac{(R_K + R_V)(H + d_{\text{head}})}{2H \, d_{\text{head}}}$

Concrete example: $H = 32, d_{\text{head}} = 128, R_K = R_V = 1$ :
- TPA cache $= 2 \times (32 + 128) = 320$ values per token
- MHA cache $= 2 \times 32 \times 128 = 8192$ values per token
so TPA leads to $96 \%$ memory reduction! For context window of 100,000 tokens, MHA needs 1.6 GB of memory wheres TPA needs 64 MB of memory! (both per layer)

Connection to MPS

Another way to look at TPA is recasting it as a MPS. Per head, instead of the term $\mathbf{x}_{i}\mathbf{W}_{\text{QK}}^{\ell,h} \mathbf{x}_{j}$ in MHA, for TPA we have

$\begin{align} (\mathbf{q}_i^h)^\top \cdot \mathbf{k}_j^h = \left(\frac{1}{R_Q} \sum_{r=1}^{R_Q} [\mathbf{a}^Q_r]_h \cdot \mathbf{b}^Q_r\right)^\top \cdot \left(\frac{1}{R_K} \sum_{s=1}^{R_K} [\mathbf{a}^K_s]_h \cdot \mathbf{b}^K_s\right) \\ = \frac{1}{R_Q R_K} \sum_{r=1}^{R_Q} \sum_{s=1}^{R_K} ([\mathbf{a}^Q_r]_h \cdot \mathbf{b}^Q_r)^\top \cdot ([\mathbf{a}^K_s]_h \cdot \mathbf{b}^K_s) \\ =\sum_{r=1}^{R_Q} \sum_{s=1}^{R_K} \underbrace{[\mathbf{a}^Q_r(\mathbf{x}_i)]_h \cdot [\mathbf{a}^K_s(\mathbf{x}_j)]_h}_{\text{head-space mixing}} \cdot \underbrace{(\mathbf{b}^Q_r(\mathbf{x}_i))^\top \cdot \mathbf{b}^K_s(\mathbf{x}_j)}_{\text{feature-space contraction}} \end{align}$

We now we are getting somewhere, right? That’s a very different take on the attention matrix capturing token-token correlations!
- Rank indices $(r,s)$ play the role of bond indices in MPS
- $\sum_{r=1}^{R_Q} \sum_{s=1}^{R_K}$ is the bond cotraction
- Low ranks $R_Q, R_K$ is equivalent to low bond dimension and increased efficiency and high bond dimension leads to more expressiveness
Copy Tensor

We can look at the above expression in terms of copy tensors in Tensor Networks. A copy tensor⁴ allows for reusing information. For a vector $\mathbf{a} \in \mathbb{R}^d$ , the copy operation is represented by a diagonal tensor, $\mathcal{C}_{ij} = \delta_{ij}$ , the Kronecker delta. In other words, a copy tensor allows a single input to be reused in multiple tensor contractions.

Note what’s happening in TPA! The same input vector $\mathbf{x}_i$ is used $2 R_Q$ times for Query, and so on for Key and Value —

$\begin{align} \mathbf{x}_i \xrightarrow{\mathbf{W}^{a,Q}_1} \mathbf{a}^Q_1(\mathbf{x}_i) \in \mathbb{R}^H \\ \mathbf{x}_i \xrightarrow{\mathbf{W}^{b,Q}_1} \mathbf{b}^Q_1(\mathbf{x}_i) \in \mathbb{R}^{d_{\text{head}}} \\ \vdots \\ \mathbf{x}_i \xrightarrow{\mathbf{W}^{a,Q}_{R_Q}} \mathbf{a}^Q_{R_Q}(\mathbf{x}_i) \in \mathbb{R}^H \\ \mathbf{x}_i \xrightarrow{\mathbf{W}^{b,Q}_{R_Q}} \mathbf{b}^Q_{R_Q}(\mathbf{x}_i) \in \mathbb{R}^{d_{\text{head}}} \end{align}$

Instead of computing H independent projections (standard MHA), TPA computes $2 R_Q$ projections and cleverly recombines them. When $R_Q \ll H$ , this architecture is much more efficient while maintaining expressiveness of a Tensor Network (outer product).

Few other things…
- The paper shows that TPA is compatible with RoPE embedding. RoPE only acts on the $\mathbf{b}$ vectors. The keys are pre-rotated and stored, so no rotation is needed during decoding. Only the current query needs to be rotated. Neat!
- Remarkably, standard attention mechanisms are non-contextual variants of TPA! They show that both GQA (Grouped Query Attention) and MQA (Multi-Query Attention) are simply poor man’s version of TPA with $\mathbf{a}$ being independent of $\mathbf{x}_i$ !
I loved the paper. The key lessons:
1. Structure matters: Exploiting low-rank structure in attention patterns enables massive compression
2. Contextual factorization: Factorizing activations (not weights) is a very interesting concept
3. Model performance and memory needs: As with several other work recently, the belief that larger context window either means larger models, or we need to compromise on expressivity of the correlations captured in attention, may be incorrect
As we push toward longer contexts and larger models, principled compression techniques like TPA is a fruitful area of research. The tensor network perspective suggests we’ve only begun to explore the space of possible architectures!

References
1. Zhang, Yifan, et al. “Tensor product attention is all you need.” arXiv preprint arXiv:2501.06425 (2025). ↩︎
2. Wu, Fa-Yueh. “The Potts Model.” Reviews of modern physics 54.1 (1982): 235. ↩︎
3. Rende, Riccardo, et al. “Mapping of attention mechanisms to a generalized Potts Model.” Physical Review Research 6.2 (2024): 023057. ↩︎
4. Glasser, Ivan, Nicola Pancotti, and J. Ignacio Cirac. “From probabilistic graphical models to generalized tensor networks for supervised learning.” IEEE Access 8 (2020): 68169-68182. ↩︎
January 30, 2026
Carrier-free mRNA delivery with Aptamers: Nucleic acid is all you need
Folks who have been dreaming happily like I have over the past decade that nucleic acids are the right substrate for engineering medicines, well, here is one more evidence that we might just be right with our obsession with this marvelous polymer of life!

Here is how I evangelized my obsession amongst colleagues.

Silicon Valley is silicon valley and not germanium valley — germanium just wasn’t the right substrate though the first transistor was made of germanium after all, see here for the first paper and here for a lovely history of the transistor.

Aren’t you glad? — Germanium Valley just doesn’t quite have the right euphony, does it?

Nucleic acids are the right substrate for genetic and gene-centric medicines and I don’t think either small molecules or proteins are. Those are the germanium of genetic medicines — they may work but the sooner you use silicon the sooner we will solve all human diseases. Yeah, I am opinionated!

Circularized RNA + cell-type targeting aptamer

A fascinating paper quietly appeared on BioRxiv¹ about a month or so back. It’s a collaboration amongst multiple groups in China, with Weihong Tan as the PI.

They report first-in-human testing of very curious idea I had toyed with for a while now as a high-risk high-reward R&D project. They created aptamer-embedded circular RNAs (Apt-circRNAs). What’s wild is that they tested the concept in Phase 1 human trial right away from what would otherwise still be a marvelous proof-of-concept tested in ex vivo (blood) setting or in in vivo (humanized rodent) studies.

They got human clinical data. Wow!

The study combines two distinct and established ideas in nucleic acids—
- Circularizing of synthetic mRNA to enhance stability (the payload)
- Use of aptamers as a targeting molecule for cell-type specific delivery
This a totally crazy pace of testing out platform ideas. For those of you who do not work in the field — why is this significant?

Current mRNA vaccines like those for COVID-19 rely on LNPs for delivery, which can sometimes cause immunogenicity and predominantly accumulates in the liver. The Apt-circRNA platform is clever: the RNA molecule itself contains targeting information (receptor-targeting aptamers) to achieve cell-type-specific delivery, eliminating the need for synthetic carriers like LNPs to gift wrap the RNA.

The Three-Module Design

The Apt-circRNA platform elegantly integrates three functional modules into a single RNA molecule—

Targeting Module

The authors embedded dendritic cell (DC)-specific aptamers at precise locations within the circular RNA scaffold. They tested three targeting aptamers—nucleolin (nuc), transferrin receptor (waz), and DEC-205 (also called CD-205) (min2).
- Recall that DEC-205 (also called CD-205) is a cell surface-receptor (and endocytic receptor) highly expressed in immature Dendritic Cells (DCs).
- Transferrin receptor (TfR) is highly expressed in mature DCs and is crucial for iron uptake
- Nucleolin is also a cell surface receptor in endothelian cells and DCs. It can internalize from cell-surface to the nucleus
They used the Waz aptamer sequence for TfR—Waz was created by a Matt Levy whom I had hired at Creyon Bio, and who has lead the aptamer team and created a diversity of cell-type specific aptamers since. We know this aptamer well!²³⁴.

Waz aptamer sequence from Ref. 3

The Waz aptamer has chemical modifications as far as I remember—2’F modified C/Us and probably 2’OMe for some positions. The study uses native RNA—so there are no chemical modifications on the aptamer sequence. It’s an important distinction to keep in mind.

The DEC-205 aptamer min2 was also discovered my Matt’s group.⁵ The sequence from Fig.1 of Ref. 4 is —

Min2 aptamer Ref 4

The waz aptamer showed superior binding to both murine and human DCs. Through some optimization, the study determined that a bispecific combination of 5 waz and 4 min2 aptamers yielded optimal antigen presentation. Intersting! Thats a lot of aptamers decorating the RNA!

Stable expression framework

The circular RNA architecture provides inherent nuclease resistance by eliminating free 5’/3′ termini. The study demsntrates that the Apt-circRNA maintains structural integrity for over 24 hours and dramatically outperforming N1-methylpseudouridine-modified linear mRNA (m1Ψ-mRNA). This confirms older work that circularization of mRNA helps in extending half-life⁶⁷. The construct also remains stable across pH 4.0-8.0 which critical for endosomal trafficking.

Antigen-Encoding Region

An Internal Ribosome Entry Site (IRES) enables cap-independent translation, while codon-optimized sequences encode tumor-specific antigens. The modular design permits flexible incorporation of diverse antigens, demonstrated with ovalbumin peptides ranging from 8 to 386 amino acids.

The Clever Circularization Strategy

Here’s where the molecular engineering gets quite ingenious! The team adapted permuted intron-exon (PIE) ribozyme systems from two sources: Anabaena pre-tRNA and T4 bacteriophage td intron. The key innovation: they engineered the aptamer’s stem-loop structure to serve as the circularization site without mutating the aptamer sequence itself!

The process works by introducing a cleavage site within the aptamer’s loop region, then engineering the group I intron’s P1 and P10 guide sequences to complement sequences flanking the aptamer cleavage site. This enables precise, ribozyme-catalyzed splicing at the predefined loop site, generating Apt-circRNA products free of residual intron sequences.

Cute!

What’s the bio-distribution of Apt-circRNA?

PET Imaging reveals precise lymph node targeting!

They used radio-labeled Apt-circRNA and positron emission tomography (PET) to track spatial and temporal distribution.

PET imaging showed predominant renal accumulation with no notable off-target accumulation in liver, spleen, heart, or other major organs. This specificity is striking and addresses a major concern with LNP-based systems, which accumulate significantly in liver and spleen. The renal accumulation is perhaps owing to renal clearance of such a relatively smaller molecular weight payload?

They also ran Cy5-labelled study—6 hours post-injection revealed that cy5-labelled Apt-circRNA preferentially accumulated in dendritic cells compared to B cells and macrophages. Apt-circRNA was efficiently internalized by DCs at the injection site!

This contrasts sharply with LNP-circRNA, which primarily remained as intact nanoparticles near the injection site before uptake by both B cells and DCs within lymph nodes. This is consistent with the expectation that aptamer-mediated recognition enables direct DC internalization and lymph node trafficking.

The study also looked at immuno-stimulatory responses. Worth a read.

Surprisingly, Luminex assays revealed that Apt-circRNA elicited lower systemic levels of reactogenicity-associated chemokines and cytokines than LNP-circRNA except IL-12. Apt-circRNA also demonstrated reduced cytotoxicity in BMDCs (murine bone-marrow derived DCs) compared to LNP-circRNA. I would have expected the opposite—recall that native RNA could invoke innate response from pathways that sniff out cytosolic RNA.

First-in-Human Clinical Trial

The authors initiated a Phase I clinical trial at Zhejiang Xiaoshan Hospital with remarkable speed, testing Apt-circRNA-KR2 in healthy volunteers. Here KR2 refers to the RNA payload expressing the mutations (G12D and G12V) most common in KRAS gene. They want to elicit a T-cell response in KRAS-mutant cancer.

G12D and G12V are two of the most common single amino acid substitutions at codon 12 of the KRAS gene. The G12D indicates a substitution of the normal amino acid Glycine (G) with Aspartic acid (D) at position 12, while G12V indicates a substitution with Valine (V).

The trial enrolled 12 healthy volunteers total. Though a small cohort, it’s very encouraging!
- Single-dose escalation cohort: 9 volunteers received 50, 100, or 250 μg doses (n=3/group)
- Multi-dose cohort: 3 HLA-A*02:01 or HLA-A*11:01-positive volunteers received 250 μg on days 1, 7, and 13
- Only 1 of 12 participants experienced any adverse event—transient flu-like symptoms resolving within 12 hours
- Zero injection-site reactions (0/12)
- Zero grade ≥2 adverse events (0/12)
- All hematologic parameters, immune cell subsets, and cytokines remained within normal ranges through 180 days
The safety profile is quite surprising and compares very favorably to LNP-mRNA vaccines, which commonly cause injection-site reactions, fever and systemic symptoms.

High notes

Unlike previous ‘naked mRNA’ approaches that lack stability and targeting, it’s a clever idea to encode aptamer within the RNA sequence itself. However, I am not convinced this is necessary and it will prohibit chemical modifications that can stabilize the aptamer, increase affinity and half-life. One could conjugate the aptamer to the circular RNA and modularize the system further.

The dual aptamer strategy is definitely very interesting. The combination of waz (TfR-targeting) and min2 (DEC-205-targeting) creates a bispecific design that enhances both binding affinity and functional outcomes. Are these serving different purposes in directly the payload to endocytic compartments?

Manufacturing could be quite scalable, as is! The in vitro transcription and ribozyme-mediated circularization can be performed at scale without the complex formulation processes required for LNPs. The >80% circularization efficiency is commercially viable. Also, perhaps this advantage is a counter-point to the first point I made about chemically-modified aptamers, for which the aptamer would have to separately synthesized and somehow conjugated to the RNA at specific sites that does not disrupt expression. Moreover, with multiple aptamers to decorate the RNA, it’s messy. Efficiency and purity of product would be a challenge.

Cellular uptake was still a bit low, and I suspect this is because the aptamers were not really optimized. They took existing aptamer sequences and slapped it on. Flow cytometry data showed that even in draining lymph nodes only a fraction of DCs take up Apt-circRNA. This may necessitate higher doses or more frequent administration compared to LNP-formulated mRNA, partially offsetting the manufacturing advantages. But I strongly believe this is a solvable engineering problem.

Also, the naked RNA formulation and size leads to rapid renal clearance. One could incorporate sugar / lipid modifications to the construct but then the modules need to be separate—the circularized RNA and the aptamer (chemically modified and say, with added ligands like PEG attached, or albumin binding aptamers). Totally doable!

The bigger picture

For decades, the field has focused on engineering increasingly sophisticated nanocarriers—optimizing lipid chemistry, surface modifications etc. What is thought provoking here is, Why engineer a carrier when you can engineer the RNA itself?

It’s a tantalizing possibility—Nucleic acid is all you need.

References
September 17, 2025

Tag: history

We Forgot to Teach AI Agents to Be Wrong on Purpose

Science is a game with three logic moves

Two ways to lose an argument

When two good arguments fight it out

A short interlude concerning the platypus

The same platypus case, running backwards

The plot twist: deduction is the fragile one

So is any of this buildable?

How do we do science in practice?

The glass box is what’s missing

What I have been building

To dose, or not to dose: that is the question

How China’s Investigator-Initiated Pathway Is Rewriting the Validation Trajectory for Cell/Tissue Targeted Medicines

A perfect storm

Two tracks to First-in-Humans

The modalities that benefit

In vivo CAR-T

In vivo gene editing

Radioligand Therapies

The mechanics of how it all moves so fast

The deal-side perspective

So where do we go next as a biotech founder?

Why this matters for the design of new medicines

A measured caveat

The calm after the storm?

A note on sources

Building Brains from Polymers: The Quiet Revolution in Organic Neuromorphic Computing

The von Neumann Bottleneck: Why silicon and your grey matter differ

Organic & Polymeric Electronics

Artificial synapses

Artificial Neurons: Making Plastics Spike

The Chinese Contribution: Printing Brains on Plastic

Why Organic polymers? The Case for Wetware

The hard problems to solve that fascinates me

So where are we heading?

References

What if we could predict real-world properties of polymeric molecules from their chemical sequences alone?

Tensor Product Attention: Curiosities abound

The Memory Bottleneck in Modern Transformers

Tensor Decompositions: A Primer

CP Decomposition (CANDECOMP/PARAFAC)

Tucker Decomposition

Tensor Train Decomposition

Tensor Product Attention: The Core Claim

Standard Multi-head Attention

TPA

Parameter counts

Memory Reduction

Connection to MPS

Copy Tensor

Few other things…

References

Carrier-free mRNA delivery with Aptamers: Nucleic acid is all you need

Circularized RNA + cell-type targeting aptamer

The Three-Module Design

Targeting Module

Stable expression framework

Antigen-Encoding Region

The Clever Circularization Strategy

What’s the bio-distribution of Apt-circRNA?

First-in-Human Clinical Trial

High notes

The bigger picture

References