惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

T
Tenable Blog
H
Heimdal Security Blog
K
Kaspersky official blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
S
Schneier on Security
G
GRAHAM CLULEY
U
Unit 42
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
C
CERT Recently Published Vulnerability Notes
Google DeepMind News
Google DeepMind News
罗磊的独立博客
Stack Overflow Blog
Stack Overflow Blog
阮一峰的网络日志
阮一峰的网络日志
Simon Willison's Weblog
Simon Willison's Weblog
C
Cisco Blogs
Cyberwarzone
Cyberwarzone
T
The Exploit Database - CXSecurity.com
Project Zero
Project Zero
Security Archives - TechRepublic
Security Archives - TechRepublic
www.infosecurity-magazine.com
www.infosecurity-magazine.com
博客园 - 司徒正美
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
V
Visual Studio Blog
博客园 - Franky
Engineering at Meta
Engineering at Meta
WordPress大学
WordPress大学
Jina AI
Jina AI
P
Proofpoint News Feed
P
Proofpoint News Feed
有赞技术团队
有赞技术团队
L
LINUX DO - 最新话题
宝玉的分享
宝玉的分享
N
News and Events Feed by Topic
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
博客园 - 聂微东
T
The Blog of Author Tim Ferriss
Spread Privacy
Spread Privacy
Application and Cybersecurity Blog
Application and Cybersecurity Blog
IT之家
IT之家
S
Security Affairs
博客园 - 叶小钗
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
小众软件
小众软件
N
News | PayPal Newsroom
Cloudbric
Cloudbric
AWS News Blog
AWS News Blog
W
WeLiveSecurity
The Last Watchdog
The Last Watchdog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
NISL@THU
NISL@THU

博客园 - seven1314pp

SAP ABAP 客制表-表格维护器-自动带出描述等 SAP ABAP 选择屏幕按钮在同一行 SAP ABAP 一个通用的ALV框架(值得收藏) SAP ABAP SQL CASE 套 CASE SAP ABAP ALV 布局 SAP ABAP开发技巧-ALV报表通过函数获取结构或表字段写入FIELDCAT SAP ABAP财务开发必备ABAP语法之 DO VARYING abap 分割字符串 ABAP 读取其他ALV的显示结果 SAP 会计凭证字段可以更改 SAP ABAP弹出对对话框错误信息设计 SAP ABAP 部分增强点 [ABAP] ABAP 三类内表 财务凭证生成过程中增强技术 SAP BTE增强 ABAP FB02 修改会计凭证的抬头文本/行项目文本的函数 ABAP ALV 单元格按钮 SAP日志表 CDHDR和CDPOS SAP abap外部断点 abap screen页签开发注意事项 abap screen表格控件后续增加栏位
PYTHON 简单的网页图片爬虫
seven1314pp · 2023-09-07 · via 博客园 - seven1314pp

直接上代码:

'''
简单的网页图片爬虫   
要先安装requests,BeautifulSoup的库  
pip install requests
pip install bs4  是一个可以从HTML或XML文件中提取数据的Python库
pip install lxml
'''
import requests  #导入requests库
from bs4 import BeautifulSoup


def get_htmls(pages=list(range(2, 5))):
    #获取待爬取的网页
    pages_list = []
    for page in pages:
        url = f"https://pic.netbian.com/4kfengjing/index_{page}.html"  #网址
        response = requests.get(url)
        response.encoding = 'gbk'
        pages_list.append(response.text)
    return pages_list


def get_picturs(htmls):
    #获取所有图片,并下载
    for html in htmls:
        soup = BeautifulSoup(html, 'html.parser')  #解析html或xml
        # print(soup.prettify())  #把要解析的字符串以标准的缩进格式输出
        # print(soup.title.string)  #输出HTML中title节点的文本内容
        # print(soup.link.attrs)  #中间的link是页签?比如<link> <title> <head>
        # print(soup.link.attrs['href'])  #指定节点的数据

        pic_li = soup.find('div', id='main').find('div', class_='slist').find(
            'ul', class_='clearfix')
        image_path = pic_li.find_all('img')
        for file in image_path:
            pic_name = './partice05' + file['alt'].replace(" ", '_') + '.jpg'
            src = file['src']
            src = f"https://pic.netbian.com/{src}"

            response = requests.get(src)

            with open(pic_name, 'wb') as f:
                f.write(response.content)
                print("picturs dowmload in:{}".format(pic_name))


htmls = get_htmls(pages=list(range(2, 3)))  #得到网页的代码list
# print(htmls)
get_picturs(htmls)