惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

量子位
S
Securelist
MyScale Blog
MyScale Blog
Jina AI
Jina AI
罗磊的独立博客
The Cloudflare Blog
美团技术团队
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
博客园 - 三生石上(FineUI控件)
月光博客
月光博客
雷峰网
雷峰网
小众软件
小众软件
aimingoo的专栏
aimingoo的专栏
大猫的无限游戏
大猫的无限游戏
博客园 - Franky
博客园 - 聂微东
Y
Y Combinator Blog
酷 壳 – CoolShell
酷 壳 – CoolShell
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
MongoDB | Blog
MongoDB | Blog
T
Tailwind CSS Blog
Attack and Defense Labs
Attack and Defense Labs
博客园_首页
Latest news
Latest news
Apple Machine Learning Research
Apple Machine Learning Research
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Hacker News
The Hacker News
G
GRAHAM CLULEY
Simon Willison's Weblog
Simon Willison's Weblog
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
P
Proofpoint News Feed
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
U
Unit 42
D
Docker
Webroot Blog
Webroot Blog
N
Netflix TechBlog - Medium
T
Tor Project blog
C
Cyber Attacks, Cyber Crime and Cyber Security
L
LINUX DO - 最新话题
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
The Last Watchdog
The Last Watchdog
B
Blog
Recent Announcements
Recent Announcements
GbyAI
GbyAI
Microsoft Azure Blog
Microsoft Azure Blog
Security Latest
Security Latest
V2EX - 技术
V2EX - 技术
N
News | PayPal Newsroom
Microsoft Security Blog
Microsoft Security Blog

Lan小站-嗯,不错! - Python

滑块验证图片匹配 - Lan小站-嗯,不错! 从 pip 到 uv:一场 Python 包管理的「换引擎」革命 通过终端管理宝塔Python项目管理器里面的Python项目 - Lan小站-嗯,不错! requests优雅的重试 - Lan小站-嗯,不错! 解决Mac下ssl.SSLCertVerificationError:[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate(_ssl.c:1056) Debian11安装部署stable-diffusion-webui记录 - Lan小站-嗯,不错! 调用ChatGPT超过4096Token后自动截取保留指定长度的Token - Lan小站-嗯,不错! django怎么在迁移数据库的时候,自动生成数据 - Lan小站-嗯,不错! python datetime 东八区时间 - Lan小站-嗯,不错!
某牛某客专栏文章爬虫 - Lan小站-嗯,不错!
Lan · 2023-10-08 · via Lan小站-嗯,不错! - Python

1696748760540.webp
代码已脱敏,自行替换

# @Time    : 2023/10/8 14:43
# @Author  : Lan
# @File    : niukespider.py
# @Software: PyCharm
import time
import requests


def get_category(catalog='10klpm'):
    url = f'https://www.lanol.cn.com/content/zhuanlan/index/catalog/{catalog}'
    return requests.get(url).json()


c = """
<!doctype html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0">
    <meta http-equiv="X-UA-Compatible" content="ie=edge">
    <title>Document</title>
</head>
<body>
{{content}}
</body>
</html>
"""


def get_content(catalog, entity):
    url = f'https://www.lanol.cn.com/content/zhuanlan/index/detail/{catalog}/{entity}?_={int(time.time() * 1000)}'
    return requests.get(url, headers={
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36'
    }).json()


if __name__ == '__main__':
    catalog = 'Gj5x2m'

    for i in get_category(catalog)['data']['catalog']:
        content = get_content(catalog, i['uuid'])['data']
        with open('./docs/' + i['title'].replace('/', '-') + '.html', 'w', encoding='utf-8') as f:
            f.write(c.replace('{{content}}', content['content']))