Academic publishing built a cathedral. The question nobody wants to answer is how much of the foundation is load-bearing.
CAS Newton launched this week — CAS's new agentic AI system that reasons autonomously across the scientific literature rather than simply indexing it. The Chemical Abstracts Service, which manages one of the most authoritative chemistry and life sciences databases on the planet, has moved from "AI helps you find things" to "AI reasons on your behalf across the corpus." That is not a product update. That is a different product category, with a different dependency on the underlying data, and a risk profile that the industry has not yet begun to articulate clearly.
The timing is instructive. In the same week that CAS Newton shipped, Retraction Watch managing editor Kate Travis testified before the U.S. House Science, Space and Technology Committee's Investigations and Oversight Subcommittee, warning that 'publish or perish' culture has produced a flood of AI-generated papers filling journals at scale. And BMJ's Journal of Medical Genetics finished retracting seven of eight papers from a 2019 special issue — papers that cleared human editorial review at the time and were only flagged years later when AI-powered integrity screening tools caught what reviewers missed.
Put those three facts together and you get a picture that should keep every publisher CPTO awake at night: we are building AI systems that reason across a corpus, and that corpus has a documented, growing, not-yet-quantified integrity problem.
This is not a hypothetical risk. It is an operational one, accumulating in real time.
The Agentic Turn Changes the Stakes
For most of the AI-in-publishing conversation, the industry has been focused on AI as a workflow tool. Springer Nature processed 3.1 million manuscript submissions through its Snapp platform in 2025 with AI integrity screening embedded at the workflow level. Aries Systems partnered with Integra this month to bring AI-driven quality and integrity checks directly into Editorial Manager — the submission platform underlying a significant fraction of the world's peer-reviewed journals. AAAS is piloting full AI automation of MDAR reporting checklists at manuscript submission across the Science family of journals, via DataSeer. Taylor & Francis renewed its DataSeer partnership this week specifically to automate data policy compliance at the journal level. These are all workflow tools. They operate on individual manuscripts, flagging anomalies before publication. That is good and necessary work.
CAS Newton is something different. It is not a screening tool for new manuscripts. It is a reasoning engine operating across accumulated knowledge — across the entire existing corpus of chemistry and life sciences literature. And the accumulated corpus is not clean.
The distinction matters enormously because of how AI systems fail. A manuscript screening tool that encounters a suspect paper can flag it. An agentic AI that reasons across the literature and encounters the same suspect paper will incorporate its claims into its reasoning chain — silently, at scale, without any flag being raised. The error doesn't get caught at intake. It propagates through every downstream reasoning task that touches that domain.
This is not an attack on CAS or on agentic AI as a concept. CAS has spent decades building and curating one of the most rigorous scientific databases in existence. If anyone has the curatorial infrastructure to make agentic AI over the literature work responsibly, CAS is on the short list. The point is systemic: even the best-curated corpus in the world contains papers that passed peer review in 2019 and were retracted in 2026 because the detection tools that would have caught them didn't exist yet.
What's Actually In There
The BMJ retraction story deserves more analysis than it has received. BMJ's Journal of Medical Genetics retracted seven of eight papers from a 2019 special issue this week after AI-powered integrity tools flagged compromised peer review and what the journal characterized as "improbable device use." The papers had been in the published record for seven years. They accumulated citations. Some of those citations are in other papers, which are cited in other papers. The cascade effect of compromised work in the scientific record is not hypothetical — it is demonstrably how science actually propagates, including its errors.
This is one incident. It is not an isolated one. Springer Nature flagged a paper earlier this year containing a fabricated AI-hallucinated reference — not to a real paper, to a paper that doesn't exist, attributed to Retraction Watch's own co-founder. A mathematician submitted a wholly ChatGPT-generated paper on pregnancy cravings and prime numbers and got it published. Retraction Watch testified to Congress this week that AI-generated papers are flooding journals at scale. These are data points from a distribution that nobody has fully characterized yet.
The honest answer to "how much compromised work is in the published record?" is that we don't know, because the tools capable of detecting it at scale have only recently come online. We are at the beginning of a retroactive audit process that has no defined end. Every month that better detection tools deploy, more historical work will be re-evaluated against new standards. The backlog is not a fixed number. It is growing as the tools improve.
An AI system reasoning across this corpus is reasoning across an unknown contamination level. That is not a small thing.
The Infrastructure Is Catching Up — Slowly
There is genuine progress to report. The fact that AI screening tools now exist and are being embedded into major publishing workflows is real progress. Springer Nature's 75-person research integrity team, augmented by AI tools running across 3.1 million submission events annually, represents a model that is meaningfully better than what the industry was doing five years ago. The Aries/Integra integration means that AI integrity checking is moving, in the company's own framing, from "bolt-on to built-in." The AAAS/DataSeer pilot is AI-assisted compliance verification at one of the most scrutinized journals in the world.
These are tools designed to catch problems before they enter the record. They are necessary and the industry is right to build and deploy them.
But the tools that catch problems before publication do not solve the problem of what is already published. For that, you need retroactive screening — which is exactly what is now surfacing the BMJ retractions and the Springer Nature hallucinated-reference incident and the rest of the growing backlog. And retroactive screening creates its own accountability problem: papers published under older, weaker standards are now being measured against current detection capabilities. The authors involved face consequences for work that cleared the bar at the time it was submitted. The policies governing how those retroactive findings get handled — who gets notified, what the escalation path looks like, what happens to citing papers, how institutions are informed — are written almost nowhere.
The integrity infrastructure is building from the intake side. The backlog is building from the other direction. The two have not yet met in the middle, and nobody at the industry level is designing the bridge.
What This Means for the AI Licensing Deals
The AI content licensing market has moved fast. Springer Publishing Company announced a partnership with Cashmere this week to manage explicit consent governance for LLM training use of healthcare education content. This is part of a broader market structure that has emerged in the last eighteen months: publishers licensing their content for AI training, with varying degrees of quality and verification assurance built into the terms.
PLOS CEO Alison Mudditt made an argument last month that deserves to be quoted directly in this context: the AI era doesn't just use open access content, it depends on the trustworthiness signals that good OA practice produces — open data, rich metadata, transparent retraction workflows. She is right, and the argument extends beyond OA. AI systems are only as reliable as the corpus they are trained on or reasoning across. If that corpus contains undisclosed AI-generated content, compromised peer review, fabricated references, and a long tail of papers that would not survive modern detection — and it does, to a degree we cannot yet measure — then the AI systems downstream inherit that contamination.
The licensing deals being signed now are largely silent on this. They specify what content can be used, for what purposes, with what attribution and compensation to rights holders. They do not, as a general matter, specify anything about the integrity status of the content being licensed, or what obligations attach to the licensor if that content is subsequently found to contain integrity issues. That is a significant gap, and it is one the industry will not be able to ignore for much longer as agentic AI systems start producing outputs that are traceable to specific corpus decisions.
The Counterargument
The argument against worrying about this is essentially: the literature has always had errors, retractions have always existed, and science has always been a self-correcting system. Using AI to accelerate that self-correction is better than the pre-AI baseline, even if the current transition is messy.
This is partly right. The pre-AI baseline for integrity screening was genuinely inadequate, and the tools being deployed now — from Springer Nature's Snapp integration to Elsevier's expanded Check Integrity screening to the Aries/Integra partnership — represent a real improvement. The fact that AI tools can now surface 2019 compromised peer review is a feature, not a bug, even if the workflow implications are painful.
The limit of this argument is the scale and speed of AI reasoning deployment. When scientific knowledge correction was human-mediated and journal-by-journal, errors propagated slowly enough that the self-correction mechanism could, most of the time, keep pace. When agentic AI systems are reasoning across the full corpus and producing outputs at machine speed, errors propagate faster than the correction mechanism can currently operate. The question is not whether science self-corrects. It is whether it can self-correct faster than AI systems can propagate the errors being corrected.
There is no evidence that it currently can.
What Needs to Happen
The gap here is not primarily technological. The tools exist. The gap is infrastructure — specifically, the absence of a real-time, machine-readable, comprehensive retraction and correction feed that AI systems can use to maintain corpus hygiene as they operate. Retraction Watch has been building this for fifteen years against significant institutional resistance, and testified to Congress about it this week. The scholarly metadata infrastructure — Crossref, COPE, publisher retraction workflows — remains fragmented enough that a paper retracted from one database is still findable and citeable from others for months or years.
For an AI system reasoning autonomously across the scientific literature, that fragmentation is not an inconvenience. It is a systematic bias toward including compromised work in its reasoning chain.
The publishing industry has spent the last three years debating AI policy at the manuscript level — what authors can use, what editors must disclose, what reviewers are permitted to do. That conversation is necessary, and Frontiers' new guidance framework published this week is a meaningful contribution to it. But it is focused entirely on the intake problem, on what enters the record going forward. It says almost nothing about what is already in the record, how AI systems should interact with it, and who bears responsibility when an AI reasoning engine produces an output grounded in a retracted paper.
That is the conversation the industry has not yet had. CAS Newton is the reason it can no longer be deferred.
The corpus is the product now. Its integrity is a product quality problem. And unlike manuscript screening, you cannot solve a product quality problem by inspecting only new inputs.