Two days ago, I built CiteLens.
It audits any website for AI readability and gives it a score out of 100 across six categories:
- Content clarity
- Structured data
- Semantic structure
- Entity definition
-
llms.txtpresence - Factual density
In the first 48 hours, 60 sites went through it.
I expected some variation.
I did not expect the same problems to show up again and again.
The numbers
The average score was 53/100.
That does not mean the content was bad. A lot of the sites had useful, well-written pages.
The problem was that the content was not always easy for AI systems to read, extract, cite, or understand.
Across the 60 sites:
- Top performers, 70+: 18%
- Average, 40 to 69: 51%
- Struggling, below 40: 31%
- Highest score: 91/100, which was CiteLens itself
- Lowest score: 7/100, from a SaaS that recently crossed $100 MRR
The main takeaway:
Most websites are still built for humans scrolling a page, not for AI systems trying to understand and cite them.
What CiteLens checks
CiteLens scores every site across six areas.
Content clarity
Is the writing direct and quotable, or buried in vague marketing copy?
Structured data
Does the site use schema.org, JSON-LD, and OpenGraph?
Semantic structure
Are headings, lists, and page sections easy to follow?
Entity definition
Does the site clearly say what it is, who it serves, and what it does?
llms.txt presence
Does the site provide an AI-readable map of its most important content?
Factual density
Does the site include concrete names, dates, stats, sources, and specific claims?
Finding 1: 94% of sites had no llms.txt
This was the most common gap.
llms.txt is an emerging standard proposed by Jeremy Howard, co-founder of Answer.AI, in September 2024.
It is a markdown file placed at the root of your domain. The goal is to tell AI systems what your site does and where your most important content lives.
Think of it as a sitemap for AI crawlers.
Out of the 60 sites audited, 94% had no llms.txt file.
The sites that did have one scored, on average, 23 points higher than sites without one.
That does not mean llms.txt fixes everything by itself. But it is one of the easiest signals to add, and most sites still do not have it.
Finding 2: JavaScript is hiding content from AI
This was the most surprising issue.
A lot of SaaS and developer tool websites look perfect in a browser. But when fetched without JavaScript, the actual content is missing.
That matters because many AI crawlers and indexing systems primarily read the initial HTML response.
If your product description, docs, pricing, use cases, and landing page copy only appear after client-side JavaScript runs, AI systems may not see them.
A simple test:
Disable JavaScript in your browser and load your site.
Whatever you see is close to what many AI crawlers see.
This affected roughly 40% of the developer tools and SaaS sites audited.
Some had strong content. Some had clear positioning. Some had useful product pages.
But the content was not present in the initial HTML.
The fix is usually server-side rendering, static generation, or a framework that outputs meaningful HTML by default.
Next.js with SSR or static generation works. Nuxt with SSR works. Astro works. Static pages work.
A blank root div does not.
Finding 3: Marketing copy is hard for AI to cite
The worst-scoring category across all 60 sites was content clarity.
The issue was not bad writing.
The issue was writing for emotion instead of extraction.
For example:
We help businesses unlock their full potential.
That might sound fine in a hero section, but there is nothing specific for an AI system to quote, summarize, or cite.
Compare that with:
CiteLens audits any website for AI readability and scores it out of 100 across six categories.
That sentence is boring.
It is also specific, factual, and citable.
AI systems cite facts, not feelings.
If your homepage reads like a pitch deck, it is harder for AI to understand what you do.
The fix is simple:
Add one plain-language sentence near the top of every important page that states:
- What you do
- Who you serve
- What outcome you deliver
Boring is citable.
Finding 4: Unsourced statistics are weak signals
A lot of sites used numbers.
Very few sourced them.
For example:
78% of buyers choose the first responder.
That claim is much weaker without attribution.
AI systems are less likely to repeat claims they cannot verify.
The same statistic with a credible source attached becomes much more useful.
This showed up across the sample again and again.
Lots of stats. Almost no sources.
The fix:
For every statistic on your site, add the source in parentheses, in a footnote, or near the claim.
If you cannot source it, either remove it or reframe it as your own observation.
Finding 5: The best sites were not always the best designed
The highest-scoring sites were not necessarily the prettiest.
They were not always the biggest.
They were not always the most polished.
They had one thing in common:
They were written clearly.
The best pages had:
- Clear headings
- Concrete claims
- Named entities
- Strong page structure
- Specific product explanations
- Factual, quotable language
Cloudflare scored 82/100 in the audit. Their documentation pages read like technical references because that is what they are.
That is exactly what AI systems prefer.
The lowest-scoring sites read more like pitch decks: high emotion, low information density.
The JavaScript problem deserves more attention
Several sites had good content, solid structured data, and clear entity definitions.
But their scores were still low.
The reason was client-side rendering.
One productivity tool scored 18/100 despite having genuinely useful content.
When the page was fetched without JavaScript, the response was basically:
<div id="root"></div>
That was it.
One empty div.
The founder had no idea. The site looked perfect in their browser.
This is going to become a bigger problem as more discovery shifts from search engines to answer engines.
If AI systems cannot read your site, they cannot cite you.
What CiteLens is becoming
CiteLens started as a diagnostic tool.
Now it is becoming managed GEO infrastructure.
Showing someone a score is useful, but it only solves half the problem.
The next step is fixing the issues automatically.
So far, CiteLens can help with:
llms.txt generation
CiteLens reads your site and generates an AI-readable llms.txt file.
Managed hosting
Your file can be hosted at:
citelens.dev/serve/yourdomain/llms.txt
Automatic updates
Every Monday, CiteLens re-reads your site and updates the file.
npm middleware for Next.js
For developers, CiteLens can serve llms.txt natively from your app.
npm i @citelens/middleware && npx citelens-setup
Two commands. Your llms.txt is live.
Try it
Paste any URL into CiteLens.
It is free. No login required. Results take about 30 seconds.
Run your site and see what AI systems can actually read.
Then drop your domain and score in the comments.
CiteLens is live at:
citelens.dev
The npm package is here:
npmjs.com/package/@citelens/middleware
The one thing I would do today:
Run your site through CiteLens.
If your score is below 50, check whether your content renders without JavaScript.
You may not have a content problem.
You may have a visibility problem.


















