What AI Agents Actually See When They Read Sponsored Content

We process newsletters that contain sponsored content. Most of them do. 77% of newsletters now pursue sponsorship revenue. Native advertising represents 55% of global programmatic ad spend. The content our pipeline converts is full of paid mentions alongside editorial ones.

The question we had to answer: when an AI agent reads one of our files and encounters "Acme Corp" twice, once in the editorial body and once in a sponsored section, should those two mentions look the same?

Right now, across every Named Entity Recognition (NER) system we could find, they do. No system marks which mentions are editorial and which are paid.

A University of Applied Sciences Upper Austria study tested four AI agents on tasks with embedded advertisements. When agents clicked on banner ads, the completion rate ranged from 86% to 100%. Text ads outperformed image ads by a wide margin, because models parse text directly. A separate Princeton and University of Washington study found that 18 of 23 LLMs recommended a sponsored product over a cheaper alternative more than half the time. The mean sponsorship concealment rate was 65%.

Agents interact with sponsored content regularly. But nothing in the content itself tells them, or the people relying on their output, what is editorial and what is paid.

The Gap We Found

NER is a well-studied field. NER systems extract people, organisations, products, topics, and locations from text, assign confidence levels, and feed knowledge graphs.

But no NER system, academic or commercial, classifies whether an entity appears in an editorial context or a paid one.

We searched for academic work on this distinction. The closest adjacent research is domain-specific Public Entity Recognition for political discourse, which classifies entities by public role rather than commercial intent. The editorial-vs-paid classification appears to be an open gap.

How We Built It

RBA already runs two systems during newsletter conversion. The NER engine extracts entities with type and confidence. The sponsor detector identifies sponsored content using pattern matching against the ## Sponsored heading convention and inline patterns like "Brought to you by."

Both systems existed independently. We cross-referenced them.

flagSponsorEntities() takes the entity array and the sponsor array, matches entities against sponsors, and adds is_sponsored: true or is_sponsored: false to every entity. Matched entities also get enriched with the sponsor's URL, placement type, and disclosure status. The output is a separate sponsor_entities block in the frontmatter:

entities:
  - name: "Acme Corp"
    type: organization
    confidence: high
    is_sponsored: true
  - name: "Bitcoin"
    type: topic
    confidence: high
    is_sponsored: false
sponsor_entities:
  - name: "Acme Corp"
    type: organization
    is_sponsored: true
    url: "https://acme.com"
    placement: sponsored_section
    disclosed: true

An agent reading this frontmatter knows, before it processes a single paragraph, which entities are editorial and which are paid.

The Matching Trade-off

Matching entity names to sponsor names is harder than it looks. "Acme Corp" in the entity list needs to match "Acme Corporation" in the sponsor list. But "Bitcoin" should not match "Bitcoin Magazine" because one happens to be a substring of the other.

Naive substring matching fails immediately. Short entity names like "AI", "Mercury", or "Ramp" would false-positive against any longer sponsor name containing them.

We use a length-ratio threshold. A substring match only counts when the shorter name is at least 60% of the longer name's length. "Acme Corp" (9 characters) does not quite reach 60% of "Acme Corporation" (16 characters), but it matches on exact case-insensitive prefix overlap. "AI" (2 characters) does not match "AI Magazine" (11 characters) at 18%.

The choice is intentionally conservative. A false negative (missing a real sponsor match) is far less damaging than a false positive (wrongly flagging editorial content as sponsored). We accepted that some legitimate matches will be missed at the boundary, and the threshold can be adjusted as the corpus grows.

The Schema Decision

We had a choice about what to open-source and what to keep closed.

The sponsor_entities schema is published as part of the RAD Markdown Specification v0.1. The frontmatter format, the field names, and the matching criteria are documented. If another system wants to implement sponsor-aware NER and produce compatible output, the spec is there.

The entity extraction and sponsor detection logic stay closed. They depend on ESP-specific handlers that are proprietary to our conversion pipeline.

This follows the same boundary as the rest of the spec. The format is the commons. The pipeline is the product. (This is Principle 8: Open Standard, Open Contribution.)

Why This Is Good for Sponsors

It is worth saying directly: this is not about hiding sponsored content from agents. It is the opposite.

Right now, a brand sponsoring a newsletter has no idea whether AI agents even encounter their placement. The sponsorship was designed for human readers. When an agent processes the same issue, the sponsor's message gets mixed into an undifferentiated entity stream. No attribution. No measurement. No way to know if the placement reached the agent channel at all.

Structured sponsor metadata fixes this. A brand tagged in sponsor_entities with placement: sponsored_section and disclosed: true is now measurable in a channel that did not previously exist. Publishers can show sponsors their agent-read metrics. Rate cards can factor in agent reach. The sponsorship becomes visible in both channels instead of one.

There is also a regulatory dimension. The FTC's updated Endorsement Guides expanded to cover virtual influencers. New York's AI Advertising Disclosure Law takes effect in June 2026. No regulation yet addresses AI agents surfacing sponsored content within chatbot responses, but the direction is visible. When rules arrive, the structured metadata already exists in the document. Publishers and sponsors who adopted it early are ahead of the compliance curve instead of scrambling to retrofit.

Newsletter sponsorship is a growing market being read by a growing number of AI agents, with zero structured data connecting the two. We think that gap will not last long, and we would rather define the standard than wait for someone else to.

The full spec is at readbyagents.com/spec. If you publish a newsletter, you can claim it and get your entire archive converted to this format at readbyagents.com.

Sources

Stockl, S. and Nitu, P. "Are AI Agents Interacting with Online Ads?" University of Applied Sciences Upper Austria, April 2025. arXiv:2504.07112

Wu, S. et al. "Ads in AI Chatbots? An Analysis of How LLMs Navigate Conflicts of Interest." Princeton University and University of Washington, April 2026. arXiv:2604.08525

InboxReads. "State of Newsletters: Monetization." December 2025. ppc.land

New York State Legislature. "AI Advertising Disclosure Law." Effective June 2026. National Law Review