分享一个适用于Typecho博客的Robots协议规则

推荐订阅源

Vulnerabilities – Threatpost

奇客Solidot–传递最新科技情报

WordPress大学

Apple Machine Learning Research

OSCHINA 社区最新新闻

月光博客

Palo Alto Networks Blog

量子位

爱范儿

The Register - Security

cs.AI updates on arXiv.org

Darknet – Hacking Tools, Hacker News & Cyber Security

记录生活，精彩一刻 - typecho

分享一个适用于Typecho博客的Robots协议规则

Huo · 2025-03-01 · via 记录生活，精彩一刻 - typecho

前言

最近登录必应的站长平台发现一个新的SEO问题:许多页面具有相同标题，最后通过设置Robots规则成功解决这个问题

Robots规则是什么？

robots协议也称爬虫协议、爬虫规则等,是指网站可建立一个robots.txt文件来告诉搜索引擎哪些页面可以抓取,哪些页面不能抓取,而搜索引擎则通过读取robots.txt文件来识别这个页面是否允许被抓取

那该如何设置呢?

首先需要在我们网站的根目录创建一个robots.txt文件
如本站的:https://9sb.net/robots.txt
接着在这个文件里面添加下面的内容

# robots.txt
User-agent: *
Allow: /*.html
Allow: /tag
Allow: /category
Disallow: /user
Disallow: /feed
Disallow: /author
Disallow: /*?scroll=comment-*
Disallow: /*/comment-page-*

其中Allow表示允许，Disallow表示不允许，可以使用*正则表达式，以上Disallow拒绝抓取的，就是可能会出现大量重复链接的，也可以在下面继续添加我扩展的，这都是一些垃圾爬虫和AI爬虫，爬取没有任何意义，并且可能会影响我们网站性能，继续在这个文件里面添加下面的内容，为了好区分中间可以隔一行空格

User-agent: DotBot
Disallow: /
User-agent: DataForSeoBot
Disallow: /
User-agent: SemrushBot
Disallow: /
User-agent: MJ12bot
Disallow: /
User-agent: AhrefsBot
Disallow: /
User-agent: Feedly
Disallow: /
User-agent: ias-ir
Disallow: /
User-agent: adsbot
Disallow: /
User-agent: barkrowler
Disallow: /
User-agent: Mail.RU_Bot
Disallow: /
User-agent: SEOkicks
Disallow: /
User-agent: ias-va
Disallow: /
User-agent: proximic
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: grapeshot
Disallow: /
User-agent: BLEXBot
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: ImagesiftBot
Disallow: /
User-agent: GoogleOther
Disallow: /
User-agent: Applebot
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: DataForSeoBot
Disallow: /
User-agent: peer39 crawler
Disallow: /
User-agent: FriendlyCrawler
Disallow: /
User-agent: magpie-crawler
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: omgili
Disallow: /
User-agent: Meltwater
Disallow: /
User-agent: AwarioSmartBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: img2dataset
Disallow: /
User-agent: YouBot
Disallow: /
User-agent: PipiBot
Disallow: /
User-agent: Seekr
Disallow: /
User-agent: scoop.it
Disallow: /
User-agent: AwarioRssBot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: PerplexityBot
Disallow: /

最后把我们的网站地图，添加进去即可，如Sitemap: https://9sb.net/sitemap.xml

然后我们需要去搜索引擎站长平台，提交我们的Robots规则，可以看《7大搜索引擎以及各自的站长平台》这篇文章，之后就静静等待即可。

本文转自：句号网络

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。