结合正则表达式爬取网页 - 噜啦 - 惯性聚合

推荐订阅源

Secure Thoughts

Proofpoint News Feed

DataBreaches.Net

Cisco Talos Blog

CXSECURITY Database RSS Feed - CXSecurity.com

About on SuperTechFans

罗磊的独立博客

WordPress大学

Cyber Attacks, Cyber Crime and Cyber Security

cs.AI updates on arXiv.org

博客园 - 三生石上(FineUI控件)

Fortinet All Blogs

Attack and Defense Labs

Visual Studio Blog

Blog — PlanetScale

CTFtime.org: upcoming CTF events

Privacy International News Feed

博客园 - 司徒正美

freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

www.infosecurity-magazine.com

Stack Overflow Blog

MIT News - Artificial intelligence

Help Net Security

Tor Project blog

Vulnerabilities – Threatpost

Threat Intelligence Blog | Flashpoint

Forbes - Security

Apple Machine Learning Research

Threat Research - Cisco Blogs

博客园 - 叶小钗

News and Events Feed by Topic

钛媒体：引领未来商业与生活新知

Simon Willison's Weblog

CERT Recently Published Vulnerability Notes

让小产品的独立变现更简单 - ezindie.com

News and Events Feed by Topic

噜啦 - 爬虫

爬虫爬取图片 - 噜啦使用爬虫爬取新闻网站标题 - 噜啦

结合正则表达式爬取网页 - 噜啦

博主：噜啦 · 2019-10-09 · via 噜啦 - 爬虫

结合正则表达式爬取网页

发布时间：2019 年 10 月 09 日
1658 次浏览
159字数
分类： Python笔记本

首页
正文

第一个爬虫哈哈哈哈哈

代码

import requests
import re

content = requests.get('http://www.cnu.cc/discoveryPage/hot-0').text
pattern = re.compile(r'<a href="(.*?)".*?title">(.*?)</d.*?author">(.*?)</di.*?src="(.*?)"', re.S)
results = re.findall(pattern, content)
print(results)

for result in results:
    url, name, author, ads = result
    print(url, re.sub('\s', '', name), re.sub('\s', '', author), ads)

运行

附上小姐姐图片地址

赞赏作者

如果觉得我的文章对你有用，请随意赞赏

结合正则表达式爬取网页

噜啦 • 2019 年 10 月 09 日

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。