惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

SecWiki News
SecWiki News
H
Help Net Security
罗磊的独立博客
Stack Overflow Blog
Stack Overflow Blog
M
MIT News - Artificial intelligence
Jina AI
Jina AI
L
LangChain Blog
K
Kaspersky official blog
I
Intezer
Martin Fowler
Martin Fowler
爱范儿
爱范儿
AWS News Blog
AWS News Blog
The Hacker News
The Hacker News
Recorded Future
Recorded Future
人人都是产品经理
人人都是产品经理
H
Hackread – Cybersecurity News, Data Breaches, AI and More
C
CXSECURITY Database RSS Feed - CXSecurity.com
Spread Privacy
Spread Privacy
Simon Willison's Weblog
Simon Willison's Weblog
U
Unit 42
N
News and Events Feed by Topic
A
Arctic Wolf
G
GRAHAM CLULEY
Microsoft Azure Blog
Microsoft Azure Blog
博客园 - 聂微东
F
Fortinet All Blogs
C
Cisco Blogs
美团技术团队
Vercel News
Vercel News
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
H
Hacker News: Front Page
T
Tailwind CSS Blog
I
InfoQ
宝玉的分享
宝玉的分享
Google DeepMind News
Google DeepMind News
博客园 - 司徒正美
P
Palo Alto Networks Blog
A
About on SuperTechFans
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
云风的 BLOG
云风的 BLOG
TaoSecurity Blog
TaoSecurity Blog
Google Online Security Blog
Google Online Security Blog
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
P
Privacy & Cybersecurity Law Blog
H
Heimdal Security Blog
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Hacker News: Ask HN
Hacker News: Ask HN
O
OpenAI News
博客园 - Franky
Scott Helme
Scott Helme

Reorx’s Forge

My OpenClaw Desperately Needs a DevOps Agent OpenClaw Is Changing My Life Rabbit R1 - The Upgraded Replacement for Smart Phones Debounce and Throttle | Reorx’s Forge Window Opener for Chrome | Reorx’s Forge 用 AI 工具快速撰写分享型推文 | Reorx’s Forge A Message to GPT-API Product Makers 谈谈我对 ChatGPT 应用的 prompt 的看法 ChatGPT Proofreader extension for Popclip 思考生活与生命在英语中的区别 | Reorx’s Forge Some random thoughts on Generative AI 《风暴英雄》对我的意义 | Reorx’s Forge The debut of Substance: A HTML-to-Markdown extractor 「荒木型」与「三浦型」创作者 | Reorx’s Forge “Moving away from UUIDs”, Really? 离开国产 SaaS | Reorx’s Forge Defeat VSCode Tab Bar | Reorx’s Forge 真正的好作品只能靠自己去发现 | Reorx’s Forge 我用过的位置追踪应用 | Reorx’s Forge 浅谈 Chrome Manifest V3 的优缺点 为什么人们在黄图群喜欢聊哲学 | Reorx’s Forge 并不乐观的全球化 | Reorx’s Forge 童年的 Disco | Reorx’s Forge Kevin Kelly 对创作者的指导 | Reorx’s Forge 不换房了,继续向前 | Reorx’s Forge State of Play September 2022 如何寻找一个理想的租房 | Reorx’s Forge 停不下来的创业者——得知 Figma 被 Adobe 收购有感 大公司为何不愿意做好用的产品? | Reorx’s Forge 我的 10 月新番表 | Reorx’s Forge 使用 Railway 和 Supabase 零成本搭建 n8n 自动化平台 分体式键盘 | Reorx’s Forge 关于 Essays 的说明 | Reorx’s Forge 要不要回互联网公司上班? | Reorx’s Forge 2022 年 9 月苹果发布会观后感 | Reorx’s Forge 2022 年 7 月和 8 月总结 种草无线便携路由器 | Reorx’s Forge 更换博客评论系统 | Reorx’s Forge 使用自动化工作流聚合信息摄入和输出 | Reorx’s Forge 我关注的独立开发者们 | Reorx’s Forge 我理想中的 Newsletter platform | Reorx’s Forge 搭建 umami 收集个人网站统计数据 | Reorx’s Forge Frontend Guide 01: Webpack babel-loader 使用指南 Google I/O 2022 Web Platform 新特性展示观看笔记 PyYAML 使用技巧 | Reorx’s Forge 重新开始使用 RSS 阅读器 | Reorx’s Forge 我的 Vim 自动补全配置变迁史 | Reorx’s Forge 使用 Sonarr 搭建自动化追番系统 | Reorx’s Forge Switch open files quickly in Obsidian A look into Heptabase's split writing experience
Tips that may save you from the hell of PyYAML
Reorx · 2022-05-15 · via Reorx’s Forge

YAML is a data-serialization language that is widely used. As a developer, I’m always dealing with YAML from time to time. But processing YAML, especially using PyYAML in Python is painful and full of traps. Here I want to share some tips and snippets that can make your life with PyYAML easier.

Code in this article is only guaranteed to work in Python 3

Always use safe_load/safe_dump

YAML’s ability to construct an arbitrary Python object makes it dangerous to use blindly. It might be harmful to your application to simply yaml.load a document from an untrusted source such as the Internet and user input.

See from PyYAML official documentation:

Warning: It is not safe to call yaml.load with any data received from an untrusted source! yaml.load is as powerful as pickle.load and so may call any Python function.

In short, you should always use yaml.safe_load and yaml.safe_dump as the standard I/O methods for YAML.

Keep keys in order (load/dump)

In Python 3.7+, the order of dict keys is naturally preserved 1, thus the dict you get from yaml.safe_load has the same order of keys as the original file.

>>> import yaml
>>> text = """---
... c: 1
... b: 1
... d: 1
... a: 1
... """
>>> d = yaml.safe_load(text)
>>> d
{'c': 1, 'b': 1, 'd': 1, 'a': 1}
>>> list(d)
['c', 'b', 'd', 'a']

When dumping dict into a YAML string, make sure to add keyword argument sort_keys=False to preserve the order of keys.

>>> print(yaml.safe_dump(d))
a: 1
b: 1
c: 1
d: 1
>>> d['e'] = 1
>>> print(yaml.safe_dump(d, sort_keys=False))
c: 1
b: 1
d: 1
a: 1
e: 1

If your Python version is lower, or you want to make sure the keys order preserving always works, you can use this library called oyaml as a drop-in replacement for pyyaml.

>>> import oyaml as yaml
>>> d = yaml.safe_load(text)
>>> d
OrderedDict([('c', 1), ('b', 1), ('d', 1), ('a', 1)])
>>> d['e'] = 1
>>> print(yaml.safe_dump(d, sort_keys=False))
c: 1
b: 1
d: 1
a: 1
e: 1

Enhance list indentation (dump)

By default, PyYAML indent list items on the same level as their parent.

>>> d = {'a': [1, 2, 3]}
>>> print(yaml.safe_dump(d))
a:
- 1
- 2
- 3

This is not a good format according to style guides like Ansible and HomeAssistant. It is also not recognized by code editors like VSCode, making the list items unfoldable in the editor.

To solve this problem, you can use the snippet below to define an IndentDumper class:

class IndentDumper(yaml.Dumper):
    def increase_indent(self, flow=False, indentless=False):
        return super(IndentDumper, self).increase_indent(flow, False)

Then pass it to the Dumper keyword argument in yaml.dump function.

>>> print(yaml.dump(d, Dumper=IndentDumper))
a:
  - 1
  - 2
  - 3

Note that Dumper cannot be passed to yaml.safe_dump which has its owner dumper class defined.

Output readable UTF-8 (dump)

By default, PyYAML assumes the user only wants ASCII code in the output, so it converts UTF-8 characters to Python’s Unicode representation.

>>> d = {'a': '你好'}
>>> print(yaml.safe_dump(d))
a: "\u4F60\u597D"

This makes the output hard to read for humans.

In the modern world, UTF-8 is widely supported, it’s safe to write UTF-8 in the output. Pass allow_unicode=True to yaml.safe_dump to enable that.

>>> print(yaml.safe_dump(d, allow_unicode=True))
a: 你好

No default_flow_style needed (dump)

Most of the time we don’t want flow style productions in the output (i.e. no JSON in YAML). According to PyYAML documentation, default_flow_style=False should be passed to yaml.safe_dump to achieve that.

After digging into the source code of the latest PyYaml (6.0), I find it is not needed anymore. You should remove this keyword argument to keep the code cleaner and less confusing.

Libraries

oyaml

Link: https://github.com/wimglenn/oyaml

As mentioned above, oyaml is a drop-in replacement for PyYAML which preserves dict ordering.

I suggest using oyaml if you already use PyYAML in your code.

It’s worth mentioning that oyaml is a single-file library with only 53 lines of code. This makes it very flexible to use, you can just copy the code to your library and customize it according to your need.

strictyaml

Link: https://github.com/crdoconnor/strictyaml

Some people say YAML is too complex and flexible to be a good configuration language, but I think this is not the problem of YAML, but the problem of how we use it. If we restrict our usage to only a subset of its features, it will be as good as it should be.

This is where StrictYAML came up. It is a type-safe YAML parser that parses and validates a restricted subset of the YAML specification.

I suggest using StrictYAML if you have strong security concerns for your application.

There are tons of great articles on the documentation site of strictyaml, definitely worth having a look at if you have thought about YAML and other configuration languages.

ruamel. yaml

Link: https://yaml.readthedocs.io/en/latest/overview.html

ruamel.yaml is a fork of PyYAML, it was released in 2009 and continuously maintained in the past decade.

The differences with PyYAML are listed here. Generally, ruamel.yaml focuses on YAML 1.2 with some opinionated enhancements for the syntax.

What interests me most is the ability to round-trip in the loading/dumping process. It works like black magic. Here’s the explanation from ruamel.yaml documentation:

A round-trip is a YAML load-modify-save sequence and ruamel.yaml tries to preserve, among others:

  • comments
  • block style and key ordering are kept, so you can diff the round-tripped source
  • flow style sequences ( ‘a: b, c, d’) (based on request and test by Anthony Sottile)
  • anchor names that are hand-crafted (i.e. not of the formidNNN)
  • merges in dictionaries are preserved

I suggest using ruamel.yaml if you have the requirement to preserve the original content as much as possible.

One thing I notice is that ruamel.yaml’s safe_load method (YAML(typ='safe').load) cannot parse flow style collection (a: {"foo": "bar"}), this is a undocumented difference with PyYAML.

Summary

YAML has its good and bad. It’s easy to read, the learning curve is mild at the beginning, but the specification is complex, which not only causes chaos in practice, but also makes implementations in different languages inconsistent with each other in many trivial aspects.

Despite these quirks, YAML is still the best configuration language for me, and as long as we can use it properly, problems will be avoided and the experience will be much better.