惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
J
Java Code Geeks
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
H
Hackread – Cybersecurity News, Data Breaches, AI and More
V
Visual Studio Blog
G
Google Developers Blog
V
V2EX
The Register - Security
The Register - Security
博客园 - 三生石上(FineUI控件)
云风的 BLOG
云风的 BLOG
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
博客园_首页
S
SegmentFault 最新的问题
博客园 - Franky
Martin Fowler
Martin Fowler
Stack Overflow Blog
Stack Overflow Blog
A
About on SuperTechFans
人人都是产品经理
人人都是产品经理
aimingoo的专栏
aimingoo的专栏
罗磊的独立博客
C
Check Point Blog
MyScale Blog
MyScale Blog
T
The Blog of Author Tim Ferriss
MongoDB | Blog
MongoDB | Blog
The GitHub Blog
The GitHub Blog
Last Week in AI
Last Week in AI
Microsoft Azure Blog
Microsoft Azure Blog
IT之家
IT之家
F
Fortinet All Blogs
Jina AI
Jina AI
P
Proofpoint News Feed
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
阮一峰的网络日志
阮一峰的网络日志
B
Blog
L
LangChain Blog
月光博客
月光博客
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
宝玉的分享
宝玉的分享
博客园 - 【当耐特】
T
Tailwind CSS Blog
酷 壳 – CoolShell
酷 壳 – CoolShell
Microsoft Security Blog
Microsoft Security Blog
WordPress大学
WordPress大学
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
B
Blog RSS Feed
博客园 - 聂微东
Hugging Face - Blog
Hugging Face - Blog
M
MIT News - Artificial intelligence
GbyAI
GbyAI

Reorx’s Forge

My OpenClaw Desperately Needs a DevOps Agent OpenClaw Is Changing My Life Rabbit R1 - The Upgraded Replacement for Smart Phones Debounce and Throttle | Reorx’s Forge Window Opener for Chrome | Reorx’s Forge 用 AI 工具快速撰写分享型推文 | Reorx’s Forge A Message to GPT-API Product Makers 谈谈我对 ChatGPT 应用的 prompt 的看法 ChatGPT Proofreader extension for Popclip 思考生活与生命在英语中的区别 | Reorx’s Forge Some random thoughts on Generative AI 《风暴英雄》对我的意义 | Reorx’s Forge The debut of Substance: A HTML-to-Markdown extractor 「荒木型」与「三浦型」创作者 | Reorx’s Forge “Moving away from UUIDs”, Really? 离开国产 SaaS | Reorx’s Forge Defeat VSCode Tab Bar | Reorx’s Forge 真正的好作品只能靠自己去发现 | Reorx’s Forge 我用过的位置追踪应用 | Reorx’s Forge 浅谈 Chrome Manifest V3 的优缺点 为什么人们在黄图群喜欢聊哲学 | Reorx’s Forge 并不乐观的全球化 | Reorx’s Forge 童年的 Disco | Reorx’s Forge Kevin Kelly 对创作者的指导 | Reorx’s Forge 不换房了,继续向前 | Reorx’s Forge State of Play September 2022 如何寻找一个理想的租房 | Reorx’s Forge 停不下来的创业者——得知 Figma 被 Adobe 收购有感 大公司为何不愿意做好用的产品? | Reorx’s Forge 我的 10 月新番表 | Reorx’s Forge 使用 Railway 和 Supabase 零成本搭建 n8n 自动化平台 分体式键盘 | Reorx’s Forge 关于 Essays 的说明 | Reorx’s Forge 要不要回互联网公司上班? | Reorx’s Forge 2022 年 9 月苹果发布会观后感 | Reorx’s Forge 2022 年 7 月和 8 月总结 种草无线便携路由器 | Reorx’s Forge 更换博客评论系统 | Reorx’s Forge 使用自动化工作流聚合信息摄入和输出 | Reorx’s Forge 我关注的独立开发者们 | Reorx’s Forge 我理想中的 Newsletter platform | Reorx’s Forge 搭建 umami 收集个人网站统计数据 | Reorx’s Forge Frontend Guide 01: Webpack babel-loader 使用指南 Google I/O 2022 Web Platform 新特性展示观看笔记 PyYAML 使用技巧 | Reorx’s Forge 重新开始使用 RSS 阅读器 | Reorx’s Forge 我的 Vim 自动补全配置变迁史 | Reorx’s Forge 使用 Sonarr 搭建自动化追番系统 | Reorx’s Forge Switch open files quickly in Obsidian A look into Heptabase's split writing experience
Tips that may save you from the hell of PyYAML
Reorx · 2022-05-15 · via Reorx’s Forge

YAML is a data-serialization language that is widely used. As a developer, I’m always dealing with YAML from time to time. But processing YAML, especially using PyYAML in Python is painful and full of traps. Here I want to share some tips and snippets that can make your life with PyYAML easier.

Code in this article is only guaranteed to work in Python 3

Always use safe_load/safe_dump

YAML’s ability to construct an arbitrary Python object makes it dangerous to use blindly. It might be harmful to your application to simply yaml.load a document from an untrusted source such as the Internet and user input.

See from PyYAML official documentation:

Warning: It is not safe to call yaml.load with any data received from an untrusted source! yaml.load is as powerful as pickle.load and so may call any Python function.

In short, you should always use yaml.safe_load and yaml.safe_dump as the standard I/O methods for YAML.

Keep keys in order (load/dump)

In Python 3.7+, the order of dict keys is naturally preserved 1, thus the dict you get from yaml.safe_load has the same order of keys as the original file.

>>> import yaml
>>> text = """---
... c: 1
... b: 1
... d: 1
... a: 1
... """
>>> d = yaml.safe_load(text)
>>> d
{'c': 1, 'b': 1, 'd': 1, 'a': 1}
>>> list(d)
['c', 'b', 'd', 'a']

When dumping dict into a YAML string, make sure to add keyword argument sort_keys=False to preserve the order of keys.

>>> print(yaml.safe_dump(d))
a: 1
b: 1
c: 1
d: 1
>>> d['e'] = 1
>>> print(yaml.safe_dump(d, sort_keys=False))
c: 1
b: 1
d: 1
a: 1
e: 1

If your Python version is lower, or you want to make sure the keys order preserving always works, you can use this library called oyaml as a drop-in replacement for pyyaml.

>>> import oyaml as yaml
>>> d = yaml.safe_load(text)
>>> d
OrderedDict([('c', 1), ('b', 1), ('d', 1), ('a', 1)])
>>> d['e'] = 1
>>> print(yaml.safe_dump(d, sort_keys=False))
c: 1
b: 1
d: 1
a: 1
e: 1

Enhance list indentation (dump)

By default, PyYAML indent list items on the same level as their parent.

>>> d = {'a': [1, 2, 3]}
>>> print(yaml.safe_dump(d))
a:
- 1
- 2
- 3

This is not a good format according to style guides like Ansible and HomeAssistant. It is also not recognized by code editors like VSCode, making the list items unfoldable in the editor.

To solve this problem, you can use the snippet below to define an IndentDumper class:

class IndentDumper(yaml.Dumper):
    def increase_indent(self, flow=False, indentless=False):
        return super(IndentDumper, self).increase_indent(flow, False)

Then pass it to the Dumper keyword argument in yaml.dump function.

>>> print(yaml.dump(d, Dumper=IndentDumper))
a:
  - 1
  - 2
  - 3

Note that Dumper cannot be passed to yaml.safe_dump which has its owner dumper class defined.

Output readable UTF-8 (dump)

By default, PyYAML assumes the user only wants ASCII code in the output, so it converts UTF-8 characters to Python’s Unicode representation.

>>> d = {'a': '你好'}
>>> print(yaml.safe_dump(d))
a: "\u4F60\u597D"

This makes the output hard to read for humans.

In the modern world, UTF-8 is widely supported, it’s safe to write UTF-8 in the output. Pass allow_unicode=True to yaml.safe_dump to enable that.

>>> print(yaml.safe_dump(d, allow_unicode=True))
a: 你好

No default_flow_style needed (dump)

Most of the time we don’t want flow style productions in the output (i.e. no JSON in YAML). According to PyYAML documentation, default_flow_style=False should be passed to yaml.safe_dump to achieve that.

After digging into the source code of the latest PyYaml (6.0), I find it is not needed anymore. You should remove this keyword argument to keep the code cleaner and less confusing.

Libraries

oyaml

Link: https://github.com/wimglenn/oyaml

As mentioned above, oyaml is a drop-in replacement for PyYAML which preserves dict ordering.

I suggest using oyaml if you already use PyYAML in your code.

It’s worth mentioning that oyaml is a single-file library with only 53 lines of code. This makes it very flexible to use, you can just copy the code to your library and customize it according to your need.

strictyaml

Link: https://github.com/crdoconnor/strictyaml

Some people say YAML is too complex and flexible to be a good configuration language, but I think this is not the problem of YAML, but the problem of how we use it. If we restrict our usage to only a subset of its features, it will be as good as it should be.

This is where StrictYAML came up. It is a type-safe YAML parser that parses and validates a restricted subset of the YAML specification.

I suggest using StrictYAML if you have strong security concerns for your application.

There are tons of great articles on the documentation site of strictyaml, definitely worth having a look at if you have thought about YAML and other configuration languages.

ruamel. yaml

Link: https://yaml.readthedocs.io/en/latest/overview.html

ruamel.yaml is a fork of PyYAML, it was released in 2009 and continuously maintained in the past decade.

The differences with PyYAML are listed here. Generally, ruamel.yaml focuses on YAML 1.2 with some opinionated enhancements for the syntax.

What interests me most is the ability to round-trip in the loading/dumping process. It works like black magic. Here’s the explanation from ruamel.yaml documentation:

A round-trip is a YAML load-modify-save sequence and ruamel.yaml tries to preserve, among others:

  • comments
  • block style and key ordering are kept, so you can diff the round-tripped source
  • flow style sequences ( ‘a: b, c, d’) (based on request and test by Anthony Sottile)
  • anchor names that are hand-crafted (i.e. not of the formidNNN)
  • merges in dictionaries are preserved

I suggest using ruamel.yaml if you have the requirement to preserve the original content as much as possible.

One thing I notice is that ruamel.yaml’s safe_load method (YAML(typ='safe').load) cannot parse flow style collection (a: {"foo": "bar"}), this is a undocumented difference with PyYAML.

Summary

YAML has its good and bad. It’s easy to read, the learning curve is mild at the beginning, but the specification is complex, which not only causes chaos in practice, but also makes implementations in different languages inconsistent with each other in many trivial aspects.

Despite these quirks, YAML is still the best configuration language for me, and as long as we can use it properly, problems will be avoided and the experience will be much better.