I audited 60 websites for AI readability. Here is what I found.

Two days ago, I built CiteLens.

It audits any website for AI readability and gives it a score out of 100 across six categories:

Content clarity
Structured data
Semantic structure
Entity definition
llms.txt presence
Factual density

In the first 48 hours, 60 sites went through it.

I expected some variation.

I did not expect the same problems to show up again and again.

The numbers

The average score was 53/100.

That does not mean the content was bad. A lot of the sites had useful, well-written pages.

The problem was that the content was not always easy for AI systems to read, extract, cite, or understand.

Across the 60 sites:

Top performers, 70+: 18%
Average, 40 to 69: 51%
Struggling, below 40: 31%
Highest score: 91/100, which was CiteLens itself
Lowest score: 7/100, from a SaaS that recently crossed $100 MRR

The main takeaway:

Most websites are still built for humans scrolling a page, not for AI systems trying to understand and cite them.

What CiteLens checks

CiteLens scores every site across six areas.

Content clarity

Is the writing direct and quotable, or buried in vague marketing copy?

Structured data

Does the site use schema.org, JSON-LD, and OpenGraph?

Semantic structure

Are headings, lists, and page sections easy to follow?

Entity definition

Does the site clearly say what it is, who it serves, and what it does?

`llms.txt` presence

Does the site provide an AI-readable map of its most important content?

Factual density

Does the site include concrete names, dates, stats, sources, and specific claims?

Finding 1: 94% of sites had no llms.txt

This was the most common gap.

llms.txt is an emerging standard proposed by Jeremy Howard, co-founder of Answer.AI, in September 2024.

It is a markdown file placed at the root of your domain. The goal is to tell AI systems what your site does and where your most important content lives.

Think of it as a sitemap for AI crawlers.

Out of the 60 sites audited, 94% had no llms.txt file.

The sites that did have one scored, on average, 23 points higher than sites without one.

That does not mean llms.txt fixes everything by itself. But it is one of the easiest signals to add, and most sites still do not have it.

Finding 2: JavaScript is hiding content from AI

This was the most surprising issue.

A lot of SaaS and developer tool websites look perfect in a browser. But when fetched without JavaScript, the actual content is missing.

That matters because many AI crawlers and indexing systems primarily read the initial HTML response.

If your product description, docs, pricing, use cases, and landing page copy only appear after client-side JavaScript runs, AI systems may not see them.

A simple test:

Disable JavaScript in your browser and load your site.

Whatever you see is close to what many AI crawlers see.

This affected roughly 40% of the developer tools and SaaS sites audited.

Some had strong content. Some had clear positioning. Some had useful product pages.

But the content was not present in the initial HTML.

The fix is usually server-side rendering, static generation, or a framework that outputs meaningful HTML by default.

Next.js with SSR or static generation works. Nuxt with SSR works. Astro works. Static pages work.

A blank root div does not.

Finding 3: Marketing copy is hard for AI to cite

The worst-scoring category across all 60 sites was content clarity.

The issue was not bad writing.

The issue was writing for emotion instead of extraction.

For example:

We help businesses unlock their full potential.

That might sound fine in a hero section, but there is nothing specific for an AI system to quote, summarize, or cite.

Compare that with:

CiteLens audits any website for AI readability and scores it out of 100 across six categories.

That sentence is boring.

It is also specific, factual, and citable.

AI systems cite facts, not feelings.

If your homepage reads like a pitch deck, it is harder for AI to understand what you do.

The fix is simple:

Add one plain-language sentence near the top of every important page that states:

What you do
Who you serve
What outcome you deliver

Boring is citable.

Finding 4: Unsourced statistics are weak signals

A lot of sites used numbers.

Very few sourced them.

For example:

78% of buyers choose the first responder.

That claim is much weaker without attribution.

AI systems are less likely to repeat claims they cannot verify.

The same statistic with a credible source attached becomes much more useful.

This showed up across the sample again and again.

Lots of stats. Almost no sources.

The fix:

For every statistic on your site, add the source in parentheses, in a footnote, or near the claim.

If you cannot source it, either remove it or reframe it as your own observation.

Finding 5: The best sites were not always the best designed

The highest-scoring sites were not necessarily the prettiest.

They were not always the biggest.

They were not always the most polished.

They had one thing in common:

They were written clearly.

The best pages had:

Clear headings
Concrete claims
Named entities
Strong page structure
Specific product explanations
Factual, quotable language

Cloudflare scored 82/100 in the audit. Their documentation pages read like technical references because that is what they are.

That is exactly what AI systems prefer.

The lowest-scoring sites read more like pitch decks: high emotion, low information density.

The JavaScript problem deserves more attention

Several sites had good content, solid structured data, and clear entity definitions.

But their scores were still low.

The reason was client-side rendering.

One productivity tool scored 18/100 despite having genuinely useful content.

When the page was fetched without JavaScript, the response was basically:

<div id="root"></div>

That was it.

One empty div.

The founder had no idea. The site looked perfect in their browser.

This is going to become a bigger problem as more discovery shifts from search engines to answer engines.

If AI systems cannot read your site, they cannot cite you.

What CiteLens is becoming

CiteLens started as a diagnostic tool.

Now it is becoming managed GEO infrastructure.

Showing someone a score is useful, but it only solves half the problem.

The next step is fixing the issues automatically.

So far, CiteLens can help with:

llms.txt generation

CiteLens reads your site and generates an AI-readable llms.txt file.

Managed hosting

Your file can be hosted at:

citelens.dev/serve/yourdomain/llms.txt

Automatic updates

Every Monday, CiteLens re-reads your site and updates the file.

npm middleware for Next.js

For developers, CiteLens can serve llms.txt natively from your app.

npm i @citelens/middleware && npx citelens-setup

Two commands. Your llms.txt is live.

Try it

Paste any URL into CiteLens.

It is free. No login required. Results take about 30 seconds.

Run your site and see what AI systems can actually read.

Then drop your domain and score in the comments.

CiteLens is live at:

citelens.dev

The npm package is here:

npmjs.com/package/@citelens/middleware

The one thing I would do today:

Run your site through CiteLens.

If your score is below 50, check whether your content renders without JavaScript.

You may not have a content problem.

You may have a visibility problem.

推荐订阅源

DEV Community

The numbers

What CiteLens checks

Content clarity

Structured data

Semantic structure

Entity definition

`llms.txt` presence

Factual density

Finding 1: 94% of sites had no llms.txt

Finding 2: JavaScript is hiding content from AI

Finding 3: Marketing copy is hard for AI to cite

Finding 4: Unsourced statistics are weak signals

Finding 5: The best sites were not always the best designed

The JavaScript problem deserves more attention

What CiteLens is becoming

llms.txt generation

Managed hosting

Automatic updates

npm middleware for Next.js

Try it

推荐订阅源

DEV Community

The numbers

What CiteLens checks

Content clarity

Structured data

Semantic structure

Entity definition

llms.txt presence

Factual density

Finding 1: 94% of sites had no llms.txt

Finding 2: JavaScript is hiding content from AI

Finding 3: Marketing copy is hard for AI to cite

Finding 4: Unsourced statistics are weak signals

Finding 5: The best sites were not always the best designed

The JavaScript problem deserves more attention

What CiteLens is becoming

llms.txt generation

Managed hosting

Automatic updates

npm middleware for Next.js

Try it

`llms.txt` presence