惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
WordPress大学
WordPress大学
小众软件
小众软件
Cloudbric
Cloudbric
AWS News Blog
AWS News Blog
腾讯CDC
量子位
人人都是产品经理
人人都是产品经理
大猫的无限游戏
大猫的无限游戏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
V
Vulnerabilities – Threatpost
Scott Helme
Scott Helme
Hugging Face - Blog
Hugging Face - Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
The Hacker News
The Hacker News
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
IT之家
IT之家
Jina AI
Jina AI
Attack and Defense Labs
Attack and Defense Labs
S
SegmentFault 最新的问题
Simon Willison's Weblog
Simon Willison's Weblog
The Cloudflare Blog
阮一峰的网络日志
阮一峰的网络日志
T
Tailwind CSS Blog
Last Week in AI
Last Week in AI
博客园 - 【当耐特】
Google Online Security Blog
Google Online Security Blog
美团技术团队
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
罗磊的独立博客
L
LINUX DO - 最新话题
博客园 - Franky
博客园 - 叶小钗
Apple Machine Learning Research
Apple Machine Learning Research
The Last Watchdog
The Last Watchdog
J
Java Code Geeks
AI
AI
C
Cisco Blogs
酷 壳 – CoolShell
酷 壳 – CoolShell
C
Cyber Attacks, Cyber Crime and Cyber Security
Cisco Talos Blog
Cisco Talos Blog
博客园 - 三生石上(FineUI控件)
雷峰网
雷峰网
Help Net Security
Help Net Security
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
云风的 BLOG
云风的 BLOG
I
Intezer
S
Securelist

matduggan.com

The Intolerable Hypocrisy of Cyberlibertarianism If I Could Make My Own GitHub You can absolutely have an RSS dependent website in 2026 I Can't See Apple's Vision Hosting a Snowflake Proxy Markdown Ate The World Update to the Ghost theme that powers this site Boy I was wrong about the Fediverse I Sold Out for $20 a Month and All I Got Was This Perfectly Generated Terraform GitButler CLI Is Really Good The Year of the 3D Printed Miniature (And Other Lies We Tell Ourselves) SQLite for a REST API Database? Making RSS More Fun I broke and fixed my Ghost blog
The Small Web is Tricky to Find
Mathew Duggan · 2026-02-13 · via matduggan.com

One of the most common requests I've gotten from users of my little Firefox extension(https://timewasterpro.xyz) has been more options around the categories of websites that you get returned. This required me to go through and parse the website information to attempt to put them into different categories. I tried a bunch of different approaches but ended up basically looking at the websites themselves seeing if there was anything that looked like a tag or a hint on each site.

This is the end conclusion of my effort at putting stuff into categories.

Unknown just means I wasn't able to get any sort of data about it. This is the result of me combining Ghost, Wordpress and Kagi Small Web data sources.

Interestingly one of my most common requests is "I would like less technical content" which as it turns out is tricky to provide because it's pretty hard to find. They sorta exist but for less technical users they don't seem to have bought into the value of the small web own your own web domain (or if they have, I haven't been able to figure out a reliable way to find them).

This is an interesting problem, especially because a lot of the tools I would have previously used to solve this problem are....basically broken. It's difficult for me to really use Google web search to find anything at this point even remotely like "give me all the small websites" because everything is weighted to steer me away from that towards Reddit. So anything that might be a little niche is tricky to figure out.

Interesting findings

So there's no point in building a web extension with a weighting algorithm to return less technical content if I cannot find a big enough pool of non-technical content to surface. It isn't that these sites don't exist its just that we never really figured out a way to reliably surface "what is a small website".

So from a technical perspective I have a bunch of problems.

  • First I need to reliably sort websites into a genre, which can be a challenge when we're talking about small websites because people typically write about whatever moves them that day. Most of the content on a site might be technical, but some of it might not be. Big sites tend to be more precise with their SEO settings but small sites that don't care don't do that, so I have fewer reliable signals to work with.
  • Then I need to come up with a lot of different feeding systems for independent websites. The Kagi Small Web was a good starting point, but Wordpress and Ghost websites have a much higher ratio of non-technical content. I need those sites, but it's hard to find a big batch of them reliably.
  • Once I have the type of website as a general genre and I have a series of locations, then I can start to reliably distribute the types of content you get.

I think I can solve....some of these, but the more I work on the problem the more I'm realizing that the entire concept of "the small web" had a series of pretty serious problems.

  • Google was the only place on Earth sending any traffic there
  • Because Google was the only one who knew about it, there never needed to be another distribution system
  • Now that Google is broken, it's almost impossible to recreate that magic of becoming the top of list for a specific subgenre without a ton more information than I can get from public records.