惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

爱范儿
爱范儿
博客园_首页
W
WeLiveSecurity
S
Secure Thoughts
S
Security @ Cisco Blogs
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Hugging Face - Blog
Hugging Face - Blog
www.infosecurity-magazine.com
www.infosecurity-magazine.com
H
Hacker News: Front Page
Project Zero
Project Zero
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
U
Unit 42
N
News and Events Feed by Topic
N
News and Events Feed by Topic
Hacker News - Newest:
Hacker News - Newest: "LLM"
Forbes - Security
Forbes - Security
T
Tor Project blog
I
Intezer
B
Blog
F
Full Disclosure
Security Archives - TechRepublic
Security Archives - TechRepublic
F
Fortinet All Blogs
Schneier on Security
Schneier on Security
T
Threat Research - Cisco Blogs
AI
AI
Google DeepMind News
Google DeepMind News
L
LINUX DO - 最新话题
Cloudbric
Cloudbric
L
Lohrmann on Cybersecurity
WordPress大学
WordPress大学
博客园 - 聂微东
雷峰网
雷峰网
P
Privacy International News Feed
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
PCI Perspectives
PCI Perspectives
Y
Y Combinator Blog
Spread Privacy
Spread Privacy
Simon Willison's Weblog
Simon Willison's Weblog
罗磊的独立博客
Vercel News
Vercel News
A
Arctic Wolf
The Register - Security
The Register - Security
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Microsoft Azure Blog
Microsoft Azure Blog
H
Heimdal Security Blog
Know Your Adversary
Know Your Adversary
P
Proofpoint News Feed
C
Cybersecurity and Infrastructure Security Agency CISA
P
Proofpoint News Feed

噜啦 - 正则

正则表达式库函数match与search的区别 - 噜啦 正则表达式分组功能实例 - 噜啦
结合正则表达式爬取网页 - 噜啦
博主: 噜啦 · 2019-10-09 · via 噜啦 - 正则

结合正则表达式爬取网页

  • 发布时间:
  • 1658 次浏览
  • 159字数
  • 分类: Python笔记本
  1. 首页
  2. 正文  

第一个爬虫哈哈哈哈哈

代码

import requests
import re

content = requests.get('http://www.cnu.cc/discoveryPage/hot-0').text
pattern = re.compile(r'<a href="(.*?)".*?title">(.*?)</d.*?author">(.*?)</di.*?src="(.*?)"', re.S)
results = re.findall(pattern, content)
print(results)

for result in results:
    url, name, author, ads = result
    print(url, re.sub('\s', '', name), re.sub('\s', '', author), ads)

运行


附上小姐姐图片地址

赞赏作者

如果觉得我的文章对你有用,请随意赞赏

结合正则表达式爬取网页

 •