惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Cloudbric
Cloudbric
E
Exploit-DB.com RSS Feed
SecWiki News
SecWiki News
Forbes - Security
Forbes - Security
N
News | PayPal Newsroom
S
Security @ Cisco Blogs
Schneier on Security
Schneier on Security
V
V2EX - 技术
S
Secure Thoughts
W
WeLiveSecurity
Google DeepMind News
Google DeepMind News
C
CERT Recently Published Vulnerability Notes
NISL@THU
NISL@THU
S
Securelist
S
Security Archives - TechRepublic
Know Your Adversary
Know Your Adversary
V
Vulnerabilities – Threatpost
Security Latest
Security Latest
Recent Commits to openclaw:main
Recent Commits to openclaw:main
G
GRAHAM CLULEY
H
Hacker News: Front Page
Microsoft Azure Blog
Microsoft Azure Blog
I
Intezer
Google Online Security Blog
Google Online Security Blog
美团技术团队
阮一峰的网络日志
阮一峰的网络日志
T
The Exploit Database - CXSecurity.com
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Webroot Blog
Webroot Blog
Jina AI
Jina AI
Engineering at Meta
Engineering at Meta
P
Proofpoint News Feed
The Cloudflare Blog
I
InfoQ
L
LangChain Blog
U
Unit 42
P
Proofpoint News Feed
S
Schneier on Security
S
Security Affairs
Y
Y Combinator Blog
T
Tenable Blog
N
News and Events Feed by Topic
MyScale Blog
MyScale Blog
量子位
Google DeepMind News
Google DeepMind News
Cyberwarzone
Cyberwarzone
博客园 - 聂微东
D
Darknet – Hacking Tools, Hacker News & Cyber Security
GbyAI
GbyAI
AWS News Blog
AWS News Blog

噜啦 - 爬虫

爬虫爬取图片 - 噜啦 结合正则表达式爬取网页 - 噜啦
使用爬虫爬取新闻网站标题 - 噜啦
博主: 噜啦 · 2019-10-09 · via 噜啦 - 爬虫

使用爬虫爬取新闻网站标题

  • 发布时间:
  • 1755 次浏览
  • 913字数
  • 分类: Python笔记本
  1. 首页
  2. 正文  

代码

from bs4 import BeautifulSoup
import requests

headers = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
    "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8,en-US;q=0.7",
    "Host": "httpbin.org",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36"
}

def Title(url):
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'lxml')
    h4 = soup.find_all('h4', class_='news__item-title mt0')
    for title in h4:
        print(title.get_text())

运行

赞赏作者

如果觉得我的文章对你有用,请随意赞赏

使用爬虫爬取新闻网站标题

 •