惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

WordPress大学
WordPress大学
The GitHub Blog
The GitHub Blog
F
Fortinet All Blogs
Cloudbric
Cloudbric
P
Palo Alto Networks Blog
T
Threatpost
T
Tor Project blog
T
Tenable Blog
AWS News Blog
AWS News Blog
Project Zero
Project Zero
L
LangChain Blog
Cyberwarzone
Cyberwarzone
Engineering at Meta
Engineering at Meta
雷峰网
雷峰网
C
CERT Recently Published Vulnerability Notes
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Security Latest
Security Latest
云风的 BLOG
云风的 BLOG
I
Intezer
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
P
Proofpoint News Feed
A
Arctic Wolf
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Google DeepMind News
Google DeepMind News
V
Vulnerabilities – Threatpost
C
Cybersecurity and Infrastructure Security Agency CISA
MongoDB | Blog
MongoDB | Blog
aimingoo的专栏
aimingoo的专栏
K
Kaspersky official blog
Jina AI
Jina AI
N
News | PayPal Newsroom
T
The Blog of Author Tim Ferriss
D
DataBreaches.Net
A
About on SuperTechFans
博客园 - 三生石上(FineUI控件)
博客园 - 【当耐特】
Hugging Face - Blog
Hugging Face - Blog
Recorded Future
Recorded Future
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
S
Secure Thoughts
TaoSecurity Blog
TaoSecurity Blog
P
Privacy & Cybersecurity Law Blog
P
Proofpoint News Feed
MyScale Blog
MyScale Blog
IT之家
IT之家
Forbes - Security
Forbes - Security
The Hacker News
The Hacker News
Last Week in AI
Last Week in AI
T
Threat Research - Cisco Blogs
Y
Y Combinator Blog

Lan小站-嗯,不错! - Python

滑块验证图片匹配 - Lan小站-嗯,不错! 从 pip 到 uv:一场 Python 包管理的「换引擎」革命 通过终端管理宝塔Python项目管理器里面的Python项目 - Lan小站-嗯,不错! requests优雅的重试 - Lan小站-嗯,不错! 解决Mac下ssl.SSLCertVerificationError:[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate(_ssl.c:1056) Debian11安装部署stable-diffusion-webui记录 - Lan小站-嗯,不错! 调用ChatGPT超过4096Token后自动截取保留指定长度的Token - Lan小站-嗯,不错! django怎么在迁移数据库的时候,自动生成数据 - Lan小站-嗯,不错! python datetime 东八区时间 - Lan小站-嗯,不错!
某牛某客专栏文章爬虫 - Lan小站-嗯,不错!
Lan · 2023-10-08 · via Lan小站-嗯,不错! - Python

1696748760540.webp
代码已脱敏,自行替换

# @Time    : 2023/10/8 14:43
# @Author  : Lan
# @File    : niukespider.py
# @Software: PyCharm
import time
import requests


def get_category(catalog='10klpm'):
    url = f'https://www.lanol.cn.com/content/zhuanlan/index/catalog/{catalog}'
    return requests.get(url).json()


c = """
<!doctype html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0">
    <meta http-equiv="X-UA-Compatible" content="ie=edge">
    <title>Document</title>
</head>
<body>
{{content}}
</body>
</html>
"""


def get_content(catalog, entity):
    url = f'https://www.lanol.cn.com/content/zhuanlan/index/detail/{catalog}/{entity}?_={int(time.time() * 1000)}'
    return requests.get(url, headers={
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36'
    }).json()


if __name__ == '__main__':
    catalog = 'Gj5x2m'

    for i in get_category(catalog)['data']['catalog']:
        content = get_content(catalog, i['uuid'])['data']
        with open('./docs/' + i['title'].replace('/', '-') + '.html', 'w', encoding='utf-8') as f:
            f.write(c.replace('{{content}}', content['content']))