惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

S
Secure Thoughts
S
Securelist
P
Proofpoint News Feed
D
DataBreaches.Net
Cisco Talos Blog
Cisco Talos Blog
C
CXSECURITY Database RSS Feed - CXSecurity.com
Project Zero
Project Zero
A
About on SuperTechFans
罗磊的独立博客
WordPress大学
WordPress大学
月光博客
月光博客
Latest news
Latest news
C
Cyber Attacks, Cyber Crime and Cyber Security
GbyAI
GbyAI
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
博客园 - 三生石上(FineUI控件)
F
Fortinet All Blogs
W
WeLiveSecurity
Attack and Defense Labs
Attack and Defense Labs
V
Visual Studio Blog
Blog — PlanetScale
Blog — PlanetScale
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
P
Privacy International News Feed
AI
AI
博客园 - 司徒正美
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Stack Overflow Blog
Stack Overflow Blog
M
MIT News - Artificial intelligence
Help Net Security
Help Net Security
T
Tor Project blog
V
Vulnerabilities – Threatpost
C
Cisco Blogs
I
Intezer
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
MyScale Blog
MyScale Blog
雷峰网
雷峰网
MongoDB | Blog
MongoDB | Blog
Forbes - Security
Forbes - Security
V
V2EX
Apple Machine Learning Research
Apple Machine Learning Research
T
Threat Research - Cisco Blogs
B
Blog RSS Feed
博客园 - 叶小钗
N
News and Events Feed by Topic
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Simon Willison's Weblog
Simon Willison's Weblog
C
CERT Recently Published Vulnerability Notes
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
N
News and Events Feed by Topic

噜啦 - 爬虫

爬虫爬取图片 - 噜啦 使用爬虫爬取新闻网站标题 - 噜啦
结合正则表达式爬取网页 - 噜啦
博主: 噜啦 · 2019-10-09 · via 噜啦 - 爬虫

结合正则表达式爬取网页

  • 发布时间:
  • 1658 次浏览
  • 159字数
  • 分类: Python笔记本
  1. 首页
  2. 正文  

第一个爬虫哈哈哈哈哈

代码

import requests
import re

content = requests.get('http://www.cnu.cc/discoveryPage/hot-0').text
pattern = re.compile(r'<a href="(.*?)".*?title">(.*?)</d.*?author">(.*?)</di.*?src="(.*?)"', re.S)
results = re.findall(pattern, content)
print(results)

for result in results:
    url, name, author, ads = result
    print(url, re.sub('\s', '', name), re.sub('\s', '', author), ads)

运行


附上小姐姐图片地址

赞赏作者

如果觉得我的文章对你有用,请随意赞赏

结合正则表达式爬取网页

 •