惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
D
Darknet – Hacking Tools, Hacker News & Cyber Security
F
Fortinet All Blogs
小众软件
小众软件
博客园_首页
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Microsoft Azure Blog
Microsoft Azure Blog
MongoDB | Blog
MongoDB | Blog
罗磊的独立博客
大猫的无限游戏
大猫的无限游戏
量子位
N
Netflix TechBlog - Medium
B
Blog
P
Proofpoint News Feed
月光博客
月光博客
Apple Machine Learning Research
Apple Machine Learning Research
人人都是产品经理
人人都是产品经理
云风的 BLOG
云风的 BLOG
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
腾讯CDC
Engineering at Meta
Engineering at Meta
Y
Y Combinator Blog
AI
AI
Stack Overflow Blog
Stack Overflow Blog
U
Unit 42
M
MIT News - Artificial intelligence
Vercel News
Vercel News
D
DataBreaches.Net
P
Palo Alto Networks Blog
宝玉的分享
宝玉的分享
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Cisco Talos Blog
Cisco Talos Blog
T
Threatpost
The Hacker News
The Hacker News
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
Security Latest
Security Latest
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
有赞技术团队
有赞技术团队
Attack and Defense Labs
Attack and Defense Labs
Recent Announcements
Recent Announcements
Hugging Face - Blog
Hugging Face - Blog
Webroot Blog
Webroot Blog
Cyberwarzone
Cyberwarzone
美团技术团队
博客园 - 司徒正美
Cloudbric
Cloudbric
J
Java Code Geeks
T
Tailwind CSS Blog
The Last Watchdog
The Last Watchdog
A
About on SuperTechFans

噜啦 - 爬虫

使用爬虫爬取新闻网站标题 - 噜啦 结合正则表达式爬取网页 - 噜啦
爬虫爬取图片 - 噜啦
博主: 噜啦 · 2019-10-09 · via 噜啦 - 爬虫

爬虫爬取图片

  • 发布时间:
  • 2490 次浏览
  • 1479字数
  • 分类: Python笔记本
  1. 首页
  2. 正文  

代码

from bs4 import BeautifulSoup
import requests
import os
import shutil

headers = {
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-User": "?1",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36"
}

def download_jpg(imgge_url, image_localpath):
    response = requests.get(imgge_url, stream=True)
    if response.status_code == 200:
        with open(image_localpath, 'wb') as f:
            response.decode_content = True
            shutil.copyfileobj(response.raw, f)

def craw(url):
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'lxml')
    for div in soup.find_all('div', class_='group'):
        for img in div.find_all('img'):
            imgurl = img['src']
            dir = os.path.abspath('./download')
            filename = os.path.basename(imgurl)
            imgpath = os.path.join(dir, filename)
            print('开始下载 %s' % imgurl)
            download_jpg(imgurl, imgpath)


for i in range(1, 10, 1):
    url = 'http://xxxxxx.com/plugin.php?id=group&page=' + str(i)
    print(url)
    print('第 %s 页' %i)
    craw(url)

运行

7PiZ22V1Gu.jpg

营养有点不足,溜了溜了

赞赏作者

如果觉得我的文章对你有用,请随意赞赏

爬虫爬取图片

 •