What old logic papers know about the scientific process that challenges your AI scientist
If you are a scientist like I am, or a clinician, let me portray a likely scenario. You are excited about your AI Scientist / Research Agent. So it set it up for research task.
Your agent reads four hundred papers before breakfast, proposes a hypothesis, designs the experiment, writes the code, runs it, and hands you a result it is ostensibly pleased with.
You feel like a Tony Stark telling Jarvis, Thrill me! and Jarvis replying The render is complete. A little ostentatious, don’t you think?
Then, a new finding lands on the table that flatly contradicts the hypothesis. You feed it in. The agent thinks for a moment and gives you, well, a revised answer. Suave. Maybe its confidence ticks down, maybe an internal ranking reshuffles, maybe it just rewrites the paragraph in a slightly more hedged voice. Its mind, if we could call it a mind, has been changed.
Now ask it the only two questions that actually matter in science:
- Which of your beliefs just took the hit?
- What, exactly, would change your mind back?
It cannot tell you. Not because it’s stupid — it is, embarrassingly, smarter than you in several fronts — but because the reasoning that produced the first answer and the reasoning that produced the second both happened inside a forward pass, in the guts of a machine (LLM) that transiently activated a stupendously large nexus of artificial neurons in both cases, mostly indistinguishably. You can ask it to narrate its reasoning, and it will, fluently and at length, in the confident cadence of Jarvis. But that narration is generated after the fact and isn’t the mechanism of reasoning. What you want is an argument.
This is the gap I want to talk about. A small group of logicians and philosophers worked out exactly what’s missing here, in detail, with theorems, before some of the people building today’s AI scientists were born. The newest paper I’m going to lean on is from 2004. The oldest is from 1987. They are not famous. They are, I will argue, the missing manual.
Science is a game with three logic moves
Forget agents for a second and ask a more basic question: what is a mechanism of doing science?
Charles Sanders Peirce — American polymath, logician, the kind of person who invented three fields before breakfast and died broke — answered this in the 19th century. Claudio Delrieux dusted it off and formalized it beautifully in a 2004 paper with the unglamorous title Abductive inference in defeasible reasoning: a model for research programmes. Peirce’s answer is that scientific reasoning is a triad of three different inference types which run in a specific order:
- Induction gathers the evidence. You collect observations. This is the move large language models (LLMs) are pretty good at, because the whole training objective is to compress a planet’s worth of regularities. Induction is probabilisitic.
- Abduction finds the best explanation for the evidence. You see a surprising fact and reverse-engineer a story that would make it unsurprising. Peirce called this the source of every genuinely new idea — the only one of the three that adds content instead of rearranging it. For example, you lawn is wet this morning. You abduce that it may have rained last night.
- Deduction then predicts what else must be true if your explanation is right, so you can go test it. If it rained, the car must be wet too. Let’s check the car. Deduction is a favorite of mathematicians. Axioms and rules lead to proofs.
And then you loop. The deduction sends you back out to gather more evidence, which throws up something your explanation didn’t predict, which forces a new explanation, which makes new predictions, and so on.
The surprising fact is: every step of this loop is defeasible. None of it is unassailable. Meaning, each conclusion is held provisionally, with the sane expectation that tomorrow’s evidence might revoke it. Delrieux’s whole project is a reasoner that can draw useful conclusions even when the knowledge it’s working from isn’t known to be true — even when it’s known to be false. Read that twice. That’s not a bug. That is precisely what doing science before you’ve figured it out feels like.
So defeasible is the important idea here. Before we go further, let’s get definitions right. The three Peircean moves are summarized in the table. Notice the last column. Each inference, however different, gets defeated, and it gets defeated in a way that is characteristic of its kind. I have framed the examples to appeal to the perspective of a clinician. You can reframe to suit your field of expertise.
| Kind | What it does | The call at the bedside | How that very call gets defeated |
|---|---|---|---|
| Deduction | Necessary, monotonic consequence: if the premises hold, the conclusion must hold. | Potassium’s back at 7.2. Severe hyperkalemia stops hearts — we need to treat it now! | The patient feels fine and the ECG is very normal. Wait! The sample hemolyzed in the tube and the 7.2 is an artifact. The clinician does not doubt her logic, she doubts a premise and retracts her former deduction. |
| Induction | Ampliative. Generalize from many cases to the next one, assigning probabilities. | Every child I’ve seen with this exact runny-nose-and-fever has had a harmless virus. Mostly like this child does too. | The child comes back with a stiff neck and a purplish rash. Tests reveal meningitis. Generalization is gone. Black swan. |
| Abduction | Inference to the best explanation. Adopt the most explanatory story and then go test it. | Crushing chest pain, sweating, radiating to the left arm, age sixty. Best running hypothesis: heart attack. | The pain is tearing, it bores through to the back, and pulse between the two arms do not match. An aortic dissection and it reverses the treatment. The winning hypothesis lost to a rival that explained more. |
Notice what every row had in common: a conclusion you would genuinely act on. Each conclusion was withdrawn when a different evidence manifested. These lead to a dubious premise, a black swan or a better explanation. That withdrawal is defeasibility, and it is the crux of good reasoning hygiene.
Note that defeasible is not a distinct category of inference. It is rather a mode where Deduction, Induction or Abduction can all be defeasible. The central thesis of this essay is that today’s AI scientists perform Deduction & Induction fairly well, Abduction moderately well, but is bad at this mode of defeasible.
How, exactly, does a conclusion get defeated? There are two ways. For that we go back to 1987.
Two ways to lose an argument
John Pollock’s 1987 paper Defeasible Reasoning opens with the most pedestrian example imaginable and it’s worth reproducing it.
Something looks red to you. That’s a perfectly good reason to believe it is red — Pollock calls it a prima facie reason, good to hold true until further notice. Now, two completely different things can go wrong with that belief.
Way one: someone produces irrefutable evidence that the ball is actually white — your eyes tricked you. You now have a reason to believe the opposite of your conclusion. Pollock calls this a rebutting defeater — it attacks the conclusion head-on. It’s the kind everybody already understands, and the kind almost all early AI modeled. Evidence for and against; we settle it on the merits of the evidence.
Way two is sneakier and far more important. Someone tells you the room is lit by a red light. They have not given you a single reason to think the thing isn’t red — it might well be. What they’ve poisoned is the link between “looks red” and “is red.” Under red lights, looking-red stops carrying information about being-red. Pollock calls this an undercutting defeater — it attacks the inference, not the conclusion.
Let’s ponder on with how different these are. A rebutting defeater changes your answer. An undercutting defeater changes whether you’re entitled to an answer at all — it cuts the wire between premise and conclusion while leaving both endpoints untouched.
Here’s the same distinction with the stakes turned up. A patient has low back pain; the prima facie read is mechanical, treat conservatively. Two very different things can arrive:
- The rebutting defeater walks in as a red flag: unexplained weight loss, night pain that wakes them, a history of breast cancer. That’s a reason for the opposite conclusion — it argues the problem is not mechanical, and attacks the diagnosis directly.
- The undercutting defeater is the one that gets people sued. You order a test to rule something out. It comes back negative. Normally that would lower your suspicion. But this particular patient was on steroids that masked the inflammation, so the negative result tells you nothing. It doesn’t argue the patient is fine. It severs the link between test negative and all clear. Nothing looks wrong. The number came back but the reasoning from that number is dead.
The undercutting case is the dangerous one precisely because nothing looks wrong, and a reasoner that only tracks rebutting defeaters — only ever asking “do I have evidence against my conclusion?”— sails straight past it.
Now go back to our agent. Its answer shifted: a confidence slid, a ranking reshuffled, a paragraph got more hedged. Which kind of defeat was that? Did the new evidence argue against the hypothesis (rebutting), or reveal that the experiment producing the original evidence was confounded (undercutting)? Those demand opposite next moves — revise the theory versus throw out the measurement and redo it.
When two good arguments fight it out
Fine — track defeaters explicitly. Now you’ve got the problem that gets interesting: what do you do when two perfectly respectable lines of reasoning point in opposite directions and neither defeats the other?
The canonical example is so good it has a name: the Nixon Diamond. Nixon was a Quaker, and Quakers are (defeasibly) pacifists. Nixon was also a Republican, and Republicans are (defeasibly) not. Both inferences are fine. Both fire. They collide. Was Nixon a pacifist?
Horty, Thomason, and Touretzky took this seriously in their 1990 paper A Skeptical Theory of Inheritance in Nonmonotonic Semantic Networks, and the idea is sharp: when defeasible chains conflict, you have a genuine fork, and you must pick a temperament.
- The credulous reasoner picks a side. It’ll hand you an answer — Nixon was a pacifist! — and if you ask again it might cheerfully hand you the opposite.
- The skeptical reasoner refuses to be conclude. Two arguments cancel? Then it suspends judgment and says so loudly. The paper argue that in many scenarios this is the grown-up move: the reasoner should never claim a conclusion it didn’t earn.
Here’s why it matters for machines. Every reasoning system has a temperament, whether or not its builders chose one. Your agent has one too — it’s just implicit, undocumented, and drifting with the stochasticity inherent in token generation.
Clinicians have a less flattering name for an invisible credulous temperament: anchoring, or premature closure. The history pulls toward two live explanations and the rushed clinician grabs the first that fits, commits, and stops looking. That’s a credulous reasoner. The skeptical move is to hold both as a conscious policy and order the thing that discriminates between them. The diagnostic-error literature is, more or less, a catalogue of what happens when a high-stakes reasoner is silently credulous. You want that temperament to be a setting and not an unknown.
A short interlude concerning the platypus
I want to slow down on one example, because it’s the one I keep coming back to, and it teaches the single most counterintuitive lesson in this whole essay.
Here is a chain a child could follow. Mammals bear live young — true, unequivocally true; dogs, whales, bats, you. The platypus is a mammal — also true; fur, milk, warm-blooded, the works. Therefore: the platypus bears live young. They are viviparous.
The platypus lays eggs. Oviparous.
Notice what didn’t happen. Neither premise was false. The platypus really is a mammal; mammals really do, as a rule, are viviparous; the inference was warranted — and yet it produced a flatly wrong conclusion. This is the canonical case from the inheritance-with-exceptions literature (Touretzky built whole formal systems around exactly this shape), and it’s the cleanest possible demonstration that a defeasible generalization does not lead to a broken universal law — it’s a different kind of object entirely. Mammals are viviparous was never a theorem. It’s a default: true unless this case says otherwise, and the platypus is exactly the case that says otherwise.
A black-box reasoner does something quietly tragic when it meets the platypus. It experiences the contradiction as noise — two signals disagree, so it averages them into a noncommittal hedge. Probably viviparous, most likely. That’s the worst possible response, because it treats the single most informative event in the whole exchange as an error.
The contradiction is not the problem. The contradiction is the discovery. The platypus case— it is your knowledge telling you where its own categories are too coarse. The right response isn’t to soften the rule. It’s to refine the structure: realize that reproductive mode is an axis, that viviparous and oviparous are on it, and that mammals bifurcate. You come out the far side of the contradiction with a better map than you went in with, monotremes filed exactly where they belong. The anomaly upgraded our knowledge.
That completely inverts the black-box instinct. Probabilistic systems treat a surprising contradiction as something to be explained away, averaged. A reasoner that keeps its priors and defeaters as explicit, inspectable objects treats the same contradiction as something to be resolved and learned from. One of them ignores the platypus case. The other says oh, interesting, and goes to reorganize the zoo.
Guess which one is doing science.
The same platypus case, running backwards
The story so far ran top-down, from a category to a property. It’s a mammal; mammals are viviparous by default; therefore, defeasibly, so does this one. That’s a default inference. It hands you a belief to act on, and it’s defeated when a more specific fact — the eggs — overrides the inherited default. Direction of travel: rule → case.
But hiding inside the same animal is a different reasoning act, running the other way, and it’s abduction. Rewind to 1799, when the first specimen reached the British Museum and the zoologist George Shaw picked it up. He had a bundle of bewildering observations — a duck’s bill, a beaver’s tail, mole-like fur, webbed feet — and reasoned backwards to the hypothesis that would best account for the bundle. His first, entirely sensible abduction: this is a hoax. Some taxidermist in the colonies must have sewed a duck’s bill onto a mole. (The era was thick with stitched-together fakes, so this was a good inference). Shaw took a pair of scissors to the pelt and went hunting for the stitches. The scientist had generated a hypothesis and went about testing it.
That difference — belief-to-act-on versus hypothesis-to-go-test — is the heart of it. Peirce, who named abduction, insisted it doesn’t deliver belief at all. What an abduction sanctions is, in his phrase, a reason to suspect — a ticket to investigate and not a conclusion to stow away. The Stanford Encyclopedia files abduction squarely under defeasible reasoning. Abduction is defeasible, but it is not the same thing as defeasible default reasoning. Defeasibility is the property — conclusions can be retracted. Abduction is a particular kind of inference — observations → best explanation — that happens to have that property. Lay the two side by side on the one animal:
- The default act: treat platypus as viviparous until told otherwise. Direction: rule → case. Gives you a belief to act on. Defeated by a more specific fact.
- The abductive act: what’s the best story for this bizarre animal— and let me go check. Direction: observations → explanation. Gives you a hypothesis to investigate. Defeated not by one fact but by a better-explaining rival, weighed across all the evidence at once.
Now watch them meld, because they’re the gears of Peirce’s loop. Abduction proposes the category (“it’s a real animal, and a mammal”). The default inheritance predicts a property (“so: viviparous”). An observation defeats the default (“oviparous”). And that defeat is precisely the anomaly that kicks abduction back into gear: if it’s a mammal and it lays eggs, what kind of mammal explains that? — and the answer, a monotreme, an early-branching lineage that simply never gave up oviparity, is a redrawn category. Notice who did the repair. The default inheritance couldn’t fix itself; it just generated a contradiction. It took abduction — the holistic, weigh-everything-at-once move — to resolve the collision by redrawing the map.
Then there’s the part that should keep anyone building a reasoning machine up at night. The mammals are not viviparous default wasn’t merely strong, it was strong enough to be wielded as a weapon against correct evidence. For the better part of a century, Aboriginal Australians and settlers reported that the platypus laid eggs, and European naturalists dismissed them, because the default outranked the testimony. The prior was quietly running an undercutting defeater, in Pollock’s exact sense, against every report that disagreed with it: the witnesses must be mistaken — mammals don’t lay eggs. It took until 1884, and eggs physically in hand, for the naturalist William Caldwell to break the spell with a famously terse telegram from Queensland: Monotremes oviparous, ovum meroblastic. That’s eighty-five years later! Do no underestimate the perils of strong priors.
That is the precise failure mode to dread in a machine with superhuman priors. An AI scientist’s defaults will be stronger than Shaw’s, mined from a planet of text, and a system that cannot distinguish “I’m applying a default this case might override” from “I’m dismissing a result because it disagrees with my prior” will do to your anomalous-but-correct data exactly what Europe did to the platypus. And a black box genuinely cannot tell those apart. The default getting overridden, the abductive hunch you ought to test, and the over-zealous prior destroying the inconvenient observation—the correct next action differs completely in these scenarios:
- Default overridden? Refine the taxonomy
- Have an abductive suspicion? Go run Caldwell’s experiment — go find the eggs.
When your priors are busy undercutting the testimony that disagrees, that’s exactly where good science is peeking.
The plot twist: deduction is the fragile one
Now I get to ruin your intuitions.
Everybody knows the hierarchy of inference. Deduction sits on top, gleaming, certain — true premises, guaranteed conclusion, no take-backs. Mathematical.
Stephen Biggs and Jessica Wilson, in a 2004 chapter with the gleefully provocative title The Indefeasibility of Abduction, argue the picture is upside down.
The trap is a near-universal assumption: that reasoning is defeasible if and only if it rests on a logically invalid argument. Under that assumption deduction (always valid) is automatically safe, and abduction (always invalid in the strict sense) is automatically risky. Clean. Tidy. Wrong.
Watch what happens when you take Pollock’s rebutting defeaters seriously and apply them to deduction itself. You run a valid deductive argument. The conclusion is absurd — it contradicts something you’re far more sure of than the premises.
This happens constantly in real science: you derive a result so ugly you know in your bones something upstream is broken. What do you do? You don’t accept it. You run the argument backwards as a rebutting defeater against one of your own premises: if the chain is valid and the conclusion is false, a premise must go. That’s reductio ad absurdum, the oldest move in the book — and it means deduction can be defeasible. Its conclusions can be defeated while the inference stays perfectly valid.
Clinicians run this backward move daily. The assay says a normal (D-dimer) blood test rules out a clot. The result is normal. The valid deduction says: no clot. But the patient in front of you looks exactly like a pulmonary embolism, and your confidence in that gestalt outranks your confidence that the assay’s premises hold for this patient — so you run the chain backwards, doubt a premise rather than your clinical intuition and observation. You defeated it anyway because something more holistic outranked it.
Abduction, by contrast, Biggs and Wilson argue, is holistic — when you infer the best explanation, you’re already weighing the whole field of rival explanations against all the evidence at once. There’s no lone exposed conclusion for a rebutting defeater to snipe, because the inference already absorbed everything a rebuttal could raise. New evidence doesn’t rebut an abduction so much as re-run the competition of logical explanations. That makes abduction, in their phrase, the ultimate arbiter of any domain it operates in: when deduction and abduction disagree, abduction wins, because abduction is what decides which premise is throws out. (This is exactly what happened to the platypus: the deductive default couldn’t repair itself but the holistic re-explanation did. Note: Not every philosopher buys this — the holism premise is precisely where critics push.)
An AI scientist that treats its formal derivations as bulletproof and its hypotheses as disposable guesses has its epistemics exactly backwards. A system that trusts the prover and discounts the abducer will defend a broken premise to the death because the deductive logic checked out.
So is any of this buildable?
Enter the paper that belongs on more whiteboards: Bondarenko, Dung, Kowalski, and Toni, An abstract, argumentation-theoretic approach to default reasoning (1997). This is the one where a whole zoo of scary formalisms but here is the crux of it.
The machine has shockingly few parts:
- A boring, monotonic base logic — the settled facts you’re not negotiating.
- A set of assumptions — the defeasible leaps, in the absence of evidence otherwise, believe this.
- An assumption can be attacked if its contrary can be derived, possibly with help from other assumptions.
- A set of assumptions is admissible if it’s conflict-free (doesn’t attack itself) and defends itself — it can counter-attack whatever attacks it.
That’s the whole substrate. Beliefs are admissible sets of assumptions, defended. Everything above lives here naturally: Pollock’s defeaters are attacks; an undercutting defeater attacks the assumption that licenses an inference rather than the conclusion. The Nixon Diamond is two assumption-sets attacking each other symmetrically, and credulous-versus-skeptical is just which extension you compute — a maximal defended set, or only what survives in every defended set.
The bookkeeping for explicit, auditable, defeater-aware reasoning was specified in this work, with semantics and existence theorems. We don’t lack the theory. We ignored it because next-token prediction was easier and, for a while, more impressive.
How do we do science in practice?
Now assemble the whole thing into the picture Delrieux was building toward, where AI scientist becomes something you could actually design.
Delrieux models a theory the way Imre Lakatos did: as a research programme with a defended structure.
- At the center is the hard core — Newton’s laws. The genetic code.
- Around it, a protective belt of auxiliary hypotheses — the negotiable assumptions that take the hits so the core doesn’t.
- A negative heuristic: when bad evidence arrives, don’t aim it at the core — absorb it into the belt.
- A positive heuristic: proactively grow and systematize the belt, ideally turning yesterday’s ad hoc patch into tomorrow’s principled consequence of the core.
So what happens if an observation arrives that the theory doesn’t merely fail to predict but actively forbids? A surprising observation is one the theory is silent on. An anomalous one is something it positively rules out.
When the anomaly hits, the programme faces a choice, and the entire health of the science rides on how it’s made:
- Absorb it into the belt. This is what Ptolemy did for centuries — every time the planets misbehaved, another circle upon circle.
- Let it reach the core. Admit the central commitment is in trouble. Copernicus.
The difference between a progressive programme and a degenerating one — between science and increasingly desperate bookkeeping — is how it routes incoming anomalies through its own defended structure.
In my taste, that routing decision is the science. It is the single most important reasoning act a scientist performs.
It runs on a doctor’s exam table every day too. A working diagnosis is a little research programme. The diagnosis is the hard core. The protective belt is every auxiliary move that explains away what doesn’t fit: the pain isn’t improving because the patient is deconditioned; the numbness is just referred etc. Each is a legitimate auxiliary hypothesis — and each is exactly the epicycle a degenerating diagnosis hides behind. The anomaly that should be allowed to reach the core is the progressive weakness, the night sweats, the patient who fails to respond the way the diagnosis insists they must.
When your agent meets an anomalous result, something in there decides whether to quietly patch the hypothesis (grow the belt) or question the framing (touch the core). It makes the Ptolemy-or-Copernicus call on every observation. You can’t audit the routing, because there’s no structure amenable to such introspection.
The glass box is what’s missing
Before I poke at things and be the contrarian: the current crop of AI scientists are pretty impressive. These tools have made real, validated findings. However, most of them are a tree search is an explore-versus-exploit machine. It is structurally incapable of representing what logical branch is in conflict with which one, and what is the defeater that decides between them.When a result contradicts an earlier assumption, the tree doesn’t route the anomaly through a defended structure.
A 2026 case study of autonomous-research frameworks found that every system produced what the authors politely called sophisticated hallucinations, and — the killer — that inside multi-agent pipelines those hallucinations get structurally integrated into plans and write-ups, so you can no longer separate genuine computation from confident fabrication. Their recommendation? Explicit separation between speculative and computed statements.
A critical analysis of Sakana.ai AI scientist noted, “It heavily depends on user input, struggles with methodological soundness, and lacks the ability to critically assess its own results. […] AI models inherit biases from historical data and cannot independently distinguish between scientific quality and consensus.“
And that is precisely my worry. The frontier systems reinvented the choreography of defeasible argumentation — propose, critique, debate, rank, evolve — then discarded the bookkeeping that would make it auditable, keeping a score where a defended structure should be.
Couldn’t we do better?
What I have been building
I have been playing with these ideas for a while now in collaboration with a small team. I decided to focus first on an AI medical assistant in Physical Therapy (PT) field. We narrowed it further to a school of PT called Mechanical Diagnostics and Therapy (MDT, McKenzie Method).
Let me walk through some of the key ideas. My criteria for a true glass-box AI agent is that:
- it should be able to organize knowledge and hypotheses in a hierarchy (of general to specific) as we do in our brains
- it should be able to track evidence states defeasibly across those hierarchies while recording their contradictions (Nixon diamonds)
- its defeasible reasoning should be auditable through the temporally evolving evidence world of the clinician-patient
- it should guarantee, within a world of hypothesis, that the deductive, inductive and adductive reasoning path can be systematically explored, and can cover the complete set of hypotheses.
Well, what is the space of evidence? Let’s start with a clinical example.
Let the hypothesis H be: This patient is allergic to penicillin.
| Notation | Evidence State | Meaning |
|---|---|---|
| H⁺ | Asserted: data/source affirms H | Documented prior anaphylaxis to penicillin. The evidence establishes the allergy as true. |
| H⁻ | Rejected — source affirms ¬H | Allergy testing negative. Patient later tolerated amoxicillin. |
| H⁰ | Source-silent — no statement (open world) | Allergy field in chart was left blank — does not mean no allergy. |
| φ | Unresolved — conflicting support, unsettled | One note says allergic, another says patient tolerated it. |
If we could record the evidence state of every hypothesis, it would create the right mathematical structure for defeasible reasoning across the three types (deduction, induction, abduction).
Picture a deterministic tool that sits exactly where the agent’s reasoning currently is inscrutable, and instead makes the whole Peircean loop into first-class, inspectable objects:
- Hypotheses that are explicit assumptions — things you can name, version, and attack.
- Defeaters that carry their type — this evidence rebuts the conclusion and that one undercuts the experiment that produced it — so an anomaly does not collapses into a regression-to-the mean behavior of the agent.
- A conflict-resolution temperament you set on purpose — skeptical when stakes are high, credulous when brainstorming — instead of one that drifts with the agent LLM you use.
- A hard core and a protective belt that actually exist as structure, as a governance policy that cannot be violated. When an anomalous result arrives you can watch the system decide where it lands — and overrule it when it’s about to quietly bury a result that should have rattled the core.
- A system that treats a contradiction correctly: not as noise to be averaged away, but as a signal that the categories need refining — a collision between a default and a fact logged as a constructive event, the thing that makes the hypothesis world better.
The glass box answers the two questions: this hypothesis took the hit, the defeat was an undercut — the confound was in this evidence, not the hypothesis or theory — and here is the one experiment that would put it back. A proper argument.
The details of the data structure we built is a story for another day, and frankly the internals are the least interesting part. The interesting part is the shape of the gap, and the gap is this:
You may know the story of the physicist Pauli. According to Peierls, on seeing a paper of a young physicist Pauli had remarked sadly, It is not even wrong.
Being wrong requires defeasible reasoning. We don’t want our AI scientists and AI medical Assistants to be not even wrong. We really don’t want Pauli to turn in his grave.
I have tried to link most of the papers and links in the post. All underlines are links you can follow.














