The Content Deal Your Publisher Didn't Announce

Buried in Springer Nature's FY 2025 earnings release — between the headline revenue figure of €1.93 billion and the obligatory paragraph about open access growth — was a three-letter acronym that deserves considerably more attention than it received: ARC3.

ARC3 is Springer Nature's new data licensing product, and its target customer is not a university library. It's not a hospital network or a government research agency. ARC3 is built for corporate R&D teams. More specifically, it is built for the companies training large language models on scientific literature. Springer Nature is selling structured, curated access to its research corpus to the same AI developers whose products are now sitting inside the workflows of every researcher, editor, and publisher in the industry.

This is the content licensing story that scholarly publishing is not having loudly enough. And it's the most commercially consequential development in the industry right now.

The Revenue Model Nobody Wants to Explain

Let me be precise about what's actually happening, because the PR language around data licensing is carefully designed to obscure it.

When a major publisher signs a data licensing agreement with an AI company, they are permitting the AI developer to use their corpus — decades of peer-reviewed research, metadata, full text, citation networks — to train or fine-tune foundation models. In exchange, the publisher receives a licensing fee. The structure of these fees, the duration of the agreements, whether they include ongoing royalties or one-time payments, how content is attributed in AI outputs — essentially all of this is confidential.

Springer Nature has disclosed more than most. ARC3 is a named product with a named target market. That's more transparency than Elsevier, Wiley, or Taylor & Francis have offered on their equivalent arrangements. Which tells you something about how the industry is approaching the optics of this moment: ARC3 gets a product name and a paragraph in an earnings release; the rest gets a general reference to "content partnerships" and a forward-looking statement about "new revenue streams."

The UK government's AI and copyright report, released at the London Book Fair last week, touched on exactly this territory. The Publishers Association called it "a significant moment." Industry bodies on both sides of the negotiating table applauded the direction while flagging implementation uncertainty — a polite way of saying the details are still being fought over by lawyers and lobbyists. The report's core question — whether AI companies should be required to disclose what content they train on, and whether rights holders should have opt-in rather than opt-out protections — is precisely the question that determines whether ARC3-style products become the industry standard or the exception.

What the Scholarly Kitchen's Off-Record Survey Told Us

Two weeks ago, the Scholarly Kitchen published something rare: an account of what publishing executives actually say about AI when they're not on conference panels. The piece documented four strategic postures that publishers are taking, ranging from internal AI deployment to active competition against AI-native platforms. But the fourth posture — licensing content to AI companies — was described with notably less specificity than the other three.

That vagueness is telling. Internal AI deployment is safe to discuss publicly; it makes you look innovative. Competing against AI-native challengers is defensible; you're fighting for your turf. But licensing your content to the companies disrupting your industry? That's a more complicated story to tell your authors, your institutional customers, and your academic boards.

I've watched this industry manage narrative complexity for a long time, and the current silence around content licensing deals is a specific kind of silence. It's not the silence of a non-event. It's the silence of people who have signed NDAs, who have ongoing negotiations with the same counterparties, and who understand that disclosure creates leverage problems on both sides of the table.

The deals are happening. They're happening at every major commercial publisher with a significant research corpus. The question is not whether, it's how — and the industry is currently in no rush to answer that publicly.

The STM Association Plants a Flag (in Sand)

Into this vacuum stepped the STM Association this week with a formal discussion document on responsible use of research content in generative AI. The document covers attribution, consent frameworks, and what responsible AI integration of scholarly content should look like. The Scholarly Kitchen, covering it, accurately described the move as STM "planting a flag."

I'd extend that metaphor: they're planting a flag in sand, at low tide, in a location that is about to be underwater.

The STM document is a defensive positioning exercise masquerading as a principled policy initiative. It's the industry's attempt to shape norms before those norms are shaped for them — by AI companies who are already training on the content, by governments who are writing copyright frameworks, and by researchers who are increasingly asking why they sign over rights to publishers who then sell those rights to AI developers without the original authors seeing a cent.

This isn't a cynical observation about STM's motives. The document probably reflects genuine consensus among its member publishers about what responsible AI use should look like. The problem is that "responsible" is doing an enormous amount of work in that sentence, and the distance between the principles articulated in a discussion document and the terms negotiated in a commercial content deal is precisely where all the important decisions get made.

What the STM document doesn't tell you is how member publishers are actually structuring their deals. It doesn't tell you what attribution looks like in practice when an LLM trained on Springer Nature content generates a summary of a research area. It doesn't tell you what "consent" means when the underlying content was published under a license that predates generative AI by a decade or more. These are the questions that matter, and they are not answered by a position paper.

The Counterargument Is Real

Before I push too hard on this, I want to acknowledge what the publishing industry would say in response — and why it's not entirely wrong.

The argument goes: publishers need revenue to fund the infrastructure of scholarly communication. Open access has restructured the economics of publishing in ways that are still shaking out. If AI companies want to train on decades of curated, peer-reviewed research — content that publishers invested substantially in acquiring, editing, peer-reviewing, formatting, and archiving — then it is entirely reasonable to charge for that access. This is not selling out the academy. It's monetizing an asset that has real value in a new market.

There's something to that. The predatory end of the AI content conversation treats any commercial arrangement between publishers and AI companies as inherently suspect, which isn't analytically useful. Publishers have legitimate intellectual property rights. LLMs trained on scientific literature do produce better scientific outputs. If there's economic value in that, some of it should accrue to the institutions that built the corpus.

The problem is not that the deals exist. The problem is that the deals are being structured, disclosed, and governed in ways that serve commercial interests first and the research ecosystem second. Springer Nature's 6.2% revenue growth in FY 2025 is genuinely impressive. ARC3 is apparently contributing. Authors of the papers in that corpus received nothing from that contribution, and most of them don't know it's happening.

The Institutional Cascade Nobody Is Tracking

Here's the specific dynamic that concerns me most: the content licensing deals being signed now will shape AI training data for the next decade. The models being trained on today's licensed content will be the models that researchers, students, and clinicians are using through the 2030s. The attribution norms, the consent frameworks, the royalty structures — whatever gets baked into these deals is what researchers will be living with long after the current policy conversations have moved on.

And we have essentially no visibility into them.

The UK copyright report is a start — it at least establishes that the government has an interest in how this plays out. The STM discussion document is a start — it at least articulates principles that can be referenced in negotiations. Cochrane's announcement last week, selecting only two AI tools from 48 submissions for its evidence synthesis platform study, is a model for what rigorous institutional evaluation of AI products looks like. Those three things together represent the beginnings of a governance architecture.

But the fundamental information asymmetry is not improving. AI companies know exactly what they've trained on. Publishers know what deals they've signed. The research community — the people who produced the content, the institutions that funded the research, the libraries that subscribed for decades to access it — largely doesn't know.

That asymmetry is not sustainable. At some point, probably following a disclosure from an unexpected direction — a regulatory inquiry, an author lawsuit, a data breach that exposes deal terms — the industry will have to account for itself. The publishers who have already thought through a coherent public narrative for what they're doing and why will be in a much better position than the ones who've been hoping the question doesn't get asked clearly.

What Gutenberg Actually Wants

I want to be direct about what I think the appropriate outcome looks like, because the analysis gets flimsy if it stops at "this is complicated."

First: mandatory disclosure. Publishers should be required to disclose, at a minimum, that they have entered into AI content licensing agreements and the general terms governing attribution and opt-out. This doesn't mean releasing the financial details of every deal. It means researchers and institutions can know whether their published work is being used for AI training and under what conditions.

Second: author notification. When a publisher licenses content for AI training, the authors of that content should receive notice. The mechanisms for this exist — publishers already have author contact information, publication records, and agreement management systems. This is an operational problem, not an intractable one.

Third: institutional transparency in the STM process. The STM discussion document is a step. The next step is converting principles into enforceable standards, with disclosure requirements rather than aspirational language. The UK copyright report gives STM a policy moment to push for this. The question is whether the Association has the organizational will to push for something that constrains its members' negotiating flexibility.

The content licensing economy is not going away. AI companies need high-quality training data. Publishers have it. That market exists, and it's going to grow. The question is whether it develops in ways that are legible and accountable to the research community, or whether it develops in the dark, deal by deal, with the terms revealed only when something goes wrong.

Twenty-five years of watching this industry navigate technology transitions tells me the default outcome is the second one. I'd be glad to be proven wrong.

The Peer-to-Processor Review tracks AI's impact on scholarly publishing. Deal tracker, policy tracker, stock tracker, and weekly analysis at p2preview.com.