结合正则表达式爬取网页 - 噜啦 - 惯性聚合

推荐订阅源

博客园_首页

Secure Thoughts

Security @ Cisco Blogs

Recent Commits to openclaw:main

Hugging Face - Blog

www.infosecurity-magazine.com

Hacker News: Front Page

cs.CV updates on arXiv.org

News and Events Feed by Topic

News and Events Feed by Topic

Hacker News - Newest: "LLM"

Forbes - Security

Tor Project blog

Full Disclosure

Security Archives - TechRepublic

Fortinet All Blogs

Schneier on Security

Threat Research - Cisco Blogs

Google DeepMind News

LINUX DO - 最新话题

Lohrmann on Cybersecurity

WordPress大学

博客园 - 聂微东

Privacy International News Feed

让小产品的独立变现更简单 - ezindie.com

cs.AI updates on arXiv.org

PCI Perspectives

Y Combinator Blog

Simon Willison's Weblog

罗磊的独立博客

The Register - Security

freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

Microsoft Azure Blog

Heimdal Security Blog

Know Your Adversary

Proofpoint News Feed

Cybersecurity and Infrastructure Security Agency CISA

Proofpoint News Feed

噜啦 - 正则

正则表达式库函数match与search的区别 - 噜啦正则表达式分组功能实例 - 噜啦

结合正则表达式爬取网页 - 噜啦

博主：噜啦 · 2019-10-09 · via 噜啦 - 正则

结合正则表达式爬取网页

发布时间：2019 年 10 月 09 日
1658 次浏览
159字数
分类： Python笔记本

首页
正文

第一个爬虫哈哈哈哈哈

代码

import requests
import re

content = requests.get('http://www.cnu.cc/discoveryPage/hot-0').text
pattern = re.compile(r'<a href="(.*?)".*?title">(.*?)</d.*?author">(.*?)</di.*?src="(.*?)"', re.S)
results = re.findall(pattern, content)
print(results)

for result in results:
    url, name, author, ads = result
    print(url, re.sub('\s', '', name), re.sub('\s', '', author), ads)

运行

附上小姐姐图片地址

赞赏作者

如果觉得我的文章对你有用，请随意赞赏

结合正则表达式爬取网页

噜啦 • 2019 年 10 月 09 日

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。