结合正则表达式爬取网页 - 噜啦 - 惯性聚合

推荐订阅源

美团技术团队

罗磊的独立博客

The Register - Security

The GitHub Blog

钛媒体：引领未来商业与生活新知

博客园 - 三生石上(FineUI控件)

Schneier on Security

博客园 - 聂微东

The Exploit Database - CXSecurity.com

Recorded Future

大猫的无限游戏

Know Your Adversary

DataBreaches.Net

Darknet – Hacking Tools, Hacker News & Cyber Security

SegmentFault 最新的问题

博客园_首页

让小产品的独立变现更简单 - ezindie.com

酷壳 – CoolShell

Cisco Talos Blog

Visual Studio Blog

Java Code Geeks

博客园 - Franky

The Cloudflare Blog

Apple Machine Learning Research

CERT Recently Published Vulnerability Notes

Google DeepMind News

Fortinet All Blogs

Privacy International News Feed

Threat Research - Cisco Blogs

The Blog of Author Tim Ferriss

Vulnerabilities – Threatpost

Recent Announcements

Blog — PlanetScale

Security Latest

MIT News - Artificial intelligence

Y Combinator Blog

Kaspersky official blog

有赞技术团队

噜啦 - 正则

正则表达式库函数match与search的区别 - 噜啦正则表达式分组功能实例 - 噜啦

结合正则表达式爬取网页 - 噜啦

博主：噜啦 · 2019-10-09 · via 噜啦 - 正则

结合正则表达式爬取网页

发布时间：2019 年 10 月 09 日
1658 次浏览
159字数
分类： Python笔记本

首页
正文

第一个爬虫哈哈哈哈哈

代码

import requests
import re

content = requests.get('http://www.cnu.cc/discoveryPage/hot-0').text
pattern = re.compile(r'<a href="(.*?)".*?title">(.*?)</d.*?author">(.*?)</di.*?src="(.*?)"', re.S)
results = re.findall(pattern, content)
print(results)

for result in results:
    url, name, author, ads = result
    print(url, re.sub('\s', '', name), re.sub('\s', '', author), ads)

运行

附上小姐姐图片地址

赞赏作者

如果觉得我的文章对你有用，请随意赞赏

结合正则表达式爬取网页

噜啦 • 2019 年 10 月 09 日

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。