惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
WordPress大学
WordPress大学
小众软件
小众软件
Cloudbric
Cloudbric
AWS News Blog
AWS News Blog
腾讯CDC
量子位
人人都是产品经理
人人都是产品经理
大猫的无限游戏
大猫的无限游戏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
V
Vulnerabilities – Threatpost
Scott Helme
Scott Helme
Hugging Face - Blog
Hugging Face - Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
The Hacker News
The Hacker News
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
IT之家
IT之家
Jina AI
Jina AI
Attack and Defense Labs
Attack and Defense Labs
S
SegmentFault 最新的问题
Simon Willison's Weblog
Simon Willison's Weblog
The Cloudflare Blog
阮一峰的网络日志
阮一峰的网络日志
T
Tailwind CSS Blog
Last Week in AI
Last Week in AI
博客园 - 【当耐特】
Google Online Security Blog
Google Online Security Blog
美团技术团队
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
罗磊的独立博客
L
LINUX DO - 最新话题
博客园 - Franky
博客园 - 叶小钗
Apple Machine Learning Research
Apple Machine Learning Research
The Last Watchdog
The Last Watchdog
J
Java Code Geeks
AI
AI
C
Cisco Blogs
酷 壳 – CoolShell
酷 壳 – CoolShell
C
Cyber Attacks, Cyber Crime and Cyber Security
Cisco Talos Blog
Cisco Talos Blog
博客园 - 三生石上(FineUI控件)
雷峰网
雷峰网
Help Net Security
Help Net Security
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
云风的 BLOG
云风的 BLOG
I
Intezer
S
Securelist

Martin Heinz's Blog

A Guide to Python's Weak References Using weakref Module Recent Docker BuildKit Features You're Missing Out On Modern Git Commands and Features You Should Be Using Everything You Can Do with Python's textwrap Module Monitoring Indoor Air Quality with Prometheus, Grafana and a CO2 Sensor Everything You Can Do with Python's bisect Module You Don't Need a Dedicated Cache Service - PostgreSQL as a Cache A Collection of Docker Images To Solve All Your Debugging Needs Weird Python "Features" That Might Catch You By Surprise Lessons Learned From Writing 100 Articles Debugging Crashes and Deadlocks in Python using PyStack Goodbye etcd, Hello PostgreSQL: Running Kubernetes with an SQL Database Remote Interactive Debugging of Python Applications Running in Kubernetes The Right Way to Run Shell Commands From Python Real Multithreading is Coming to Python - Learn How You Can Use It Now Kubernetes-Native Synthetic Monitoring with Kuberhealthy Make Your CLI Demos a Breeze with Zero Stress and Zero Mistakes Reduce - The Power of a Single Python Function Why I Will Never Use Alpine Linux Ever Again Cgroups - Deep Dive into Resource Management in Kubernetes Dictionary Dispatch Pattern in Python Boost Your Python Application Performance using Continuous Profiling Lazy Evaluation Using Recursive Python Generators Python Magic Methods You Haven't Heard About Getting Started with Mastodon API in Python Backup-and-Restore of Containers with Kubernetes Checkpointing API Getting Started with Google APIs in Python Python CLI Tricks That Don't Require Any Code Whatsoever All The Ways To Introspect Python Objects at Runtime What is Python's "self" Argument, Anyway? Python List Comprehensions Are More Powerful Than You Might Think You Should Be Using Python's Walrus Operator - Here's Why Recipes and Tricks for Effective Structural Pattern Matching in Python It's Time to Say Goodbye to These Obsolete Python Libraries Advanced Features of Kubernetes' Horizontal Pod Autoscaler Data and System Visualization Tools That Will Boost Your Productivity Stop Messing with Kubernetes Finalizers Automate All the Boring Kubernetes Operations with Python End-to-End Monitoring with Grafana Cloud with Minimal Effort Bitly | bit.ly/3JLmSgA Bitly | bit.ly/3uETfbi Ultimate CI Pipeline for All of Your Python Projects Bitly | bit.ly/3M30D82 Bitly | bit.ly/3oMJ6qR Bitly | bit.ly/3IRD7IK Bitly | bit.ly/3A3B69t Profiling and Analyzing Performance of Python Programs Bitly | bit.ly/30uviIM Bitly | bit.ly/3E1X2mw Bitly | bit.ly/3Dv7JxP Bitly | bit.ly/3GG1BEz Bitly | bit.ly/3lLavs4 Bitly | bit.ly/39TqP3m Bitly | bit.ly/3A5Mpx8 Bitly | bit.ly/3kGwPl4 Bitly | bit.ly/3iHtulU Bitly | bit.ly/3xGjtKS Bitly | bit.ly/3h8DZg0 Bitly | bit.ly/2RQn1dG Bitly | bit.ly/3p2B5wW Bitly | bit.ly/3tULpb0 Bitly | bit.ly/2PHVudx Bitly | bit.ly/3uPtnb0 Bitly | bit.ly/3dg3QR9 Bitly | bit.ly/3qHtSkZ Bitly | bit.ly/3kIkTPr Bitly | bit.ly/3qlRAUN Bitly | bit.ly/3pCUJ26 Hardening Docker and Kubernetes with seccomp Bitly | bit.ly/34ZhIMt Bitly | bit.ly/3qSO7h0 Bitly | bit.ly/3muGLOk Bitly | bit.ly/35xN79v Bitly | bit.ly/3mLGshK Bitly | bit.ly/2IvkGQl Bitly | bit.ly/2Sk1KFK Bitly | bit.ly/3iCNIL6 Bitly | bit.ly/3beQPpy Saving Your Linux Machine from Certain Death New Features in Python 3.9 You Should Know About Deploy Any Python Project to Kubernetes Analyzing Docker Image Security Recursive SQL Queries with PostgreSQL Automating Every Aspect of Your Python Project Tour of Python Itertools Implementing 2D Physics in Javascript Ultimate Setup for Your Next Python Project Making Python Programs Blazingly Fast Security and Cryptography Mistakes You Are Probably Doing All The Time Going Serverless with OpenFaaS and Golang - Building Optimized Templates Going Serverless with OpenFaaS and Golang - The Ultimate Setup and Workflow Setting Up Swagger Docs for Golang API Building RESTful APIs in Golang Pytest Features, That You Need in Your (Testing) Life Setting up GitHub Package Registry with Docker and Golang Ultimate Setup for Your Next Golang Project Python Tips and Trick, You Haven't Already Seen, Part 2. Tricks for Postgres and Docker that will make your life easier Getting The Most Out of Reading Books - Reading The "Professional Way" Python Tips and Trick, You Haven't Already Seen
Python's Missing Batteries: Essential Libraries You're Missing Out On
Martin · 2023-05-02 · via Martin Heinz's Blog

Python is known to come with "batteries included", thanks to its very extensive standard library, which includes many modules and functions that you would not expect to be there. However, there are many more "essential" Python libraries out there that you should know about and be using in all of your Python projects, and here's the list.

General Purpose Utilities

We will begin with a couple general purpose libraries that you can put to use in type of project. First one being boltons, which is described in docs as:

Boltons is a set of pure-Python utilities in the same spirit as — and yet conspicuously missing from — the standard library.

We would need a whole article just to go over every function and feature of boltons, but here are a couple examples of handy functions:


# pip install boltons
from boltons import jsonutils, timeutils, iterutils
from datetime import date

# {"name": "John", "id": 1, "active": true}
# {"name": "Ben", "id": 2, "active": false}
# {"name": "Mary", "id": 3, "active": true}
with open('input.jsonl') as f:
    for line in jsonutils.JSONLIterator(f):  # Automatically converted to dict
        print(f"User: {line['name']} with ID {line['id']} is {'active' if line['active'] else 'inactive'}")
        # User: John with ID 1 is active
        # ...

start_date = date(year=2023, month=4, day=9)
end_date = date(year=2023, month=4, day=30)

for day in timeutils.daterange(start_date, end_date, step=(0, 0, 2)):
    print(repr(day))
    # datetime.date(2023, 4, 9)
    # datetime.date(2023, 4, 11)
    # datetime.date(2023, 4, 13)

data = {"deeply": {"nested": {"python": {"dict": "value"}}}}
iterutils.get_path(data, ("deeply", "nested", "python"))
# {'dict': 'value'}

data = {"id": "234411",
        "node1": {"id": 1234, "value": "some data"},
        "node2": {"id": "2352345",
                  "node3": {"id": "123422", "value": "more data"}
                  }
        }

iterutils.remap(data, lambda p, k, v: (k, int(v)) if k == 'id' else (k, v))

While Python's standard library has json module, it does not support JSON Lines (.jsonl) format. First example shows how you can process jsonl using boltons.

Second examples showcases boltons.timeutils module which allows you to create date-ranges. You can iterate over them as well as set step argument to - for example - get every other day. Again, this is something that's missing from Python's datetime module.

Finally, in the third example, we use remap function from boltons.iterutils module to recursively convert all id fields in dictionary to integers. The boltons.iterutils here serves as a nice extension to builtin itertools.

Speaking of iterutils and itertools, next great library you need to check out is more-itertools, which provides well, more itertools. Again, discussion about more-itertools would warrant a whole article and... I wrote one, you can check it out here.

Last one for this category is sh, which is a subprocess module replacement. Great if you find yourself orchestrating lots of other processes in Python:


# https://pypi.org/project/sh/
# pip install sh
import sh

# Run any command in $PATH...
print(sh.ls('-la'))
# total 36
# drwxrwxr-x  2 martin martin  4096 apr  8 14:18 .
# drwxrwxr-x 41 martin martin 20480 apr  7 15:23 ..
# -rw-rw-r--  1 martin martin    30 apr  8 14:18 examples.py

with sh.contrib.sudo:
    # Do stuff using 'sudo'...
    ...

# Write to a file:
sh.ifconfig(_out='/tmp/interfaces')

# Piping:
print(sh.wc('-l', _in=sh.ls('.', '-1')))
# Same as 'ls -1 | wc -l'

When we invoke sh.some_command, sh library tries to look for builtin shell command or a binary in your $PATH with that name. If it finds such command, it will simply execute it for you.

In case you need to use sudo, you can use the sudo context manager from contrib module, as shown in the second part of the snippet.

To write output of a command to a file you only need to provide _out argument to the function. And finally, you can also use pipes (|) by using _in argument.

Data Validation

Another "missing battery" in Python standard library is category of data validation tools. One small library that provides this is called validators. This library lets you validate common patterns such as emails, IPs or credit cards:


# https://python-validators.github.io/validators/
# pip install validators
import validators

validators.email('someone@example.com')  # True
validators.card.visa('...')
validators.ip_address.ipv4('1.2.3.456')  # ValidationFailure(func=ipv4, args={'value': '1.2.3.456'})

Next up is fuzzy string comparison - Python includes difflib for this, but this module could use some improvements. Some of which can be found in thefuzz library (previously known as fuzzywuzzy):


# pip install thefuzz
from thefuzz import fuzz
from thefuzz import process

print(fuzz.ratio("Some text for testing", "text for some testing"))  # 76
print(fuzz.token_sort_ratio("Some text for testing", "text for some testing"))  # 100
print(fuzz.token_sort_ratio("Some text for testing", "some testing text for some text testing"))  # 70
print(fuzz.token_set_ratio("Some text for testing", "some testing text for some text testing"))  # 100

songs = [
    '01 Radiohead - OK Computer - Airbag.mp3',
    '02 Radiohead - OK Computer - Paranoid Android.mp3',
    '04 Radiohead - OK Computer - Exit Music (For a Film).mp3',
    '06 Radiohead - OK Computer - Karma Police.mp3',
    '10 Radiohead - OK Computer - No Surprises.mp3',
    '11 Radiohead - OK Computer - Lucky.mp3',
    '01 Radiohead - Pablo Honey - You.mp3',
    '02 Radiohead - Pablo Honey - Creep.mp3',
    '04 Radiohead - Pablo Honey - Stop Whispering.mp3',
    '06 Radiohead - Pablo Honey - Anyone Can Play Guitar.mp3',
    "10 Radiohead - Pablo Honey - I Can't.mp3",
    '13 Radiohead - Pablo Honey - Creep (Radio Edit).mp3',
    # ...
]
print(process.extract("Radiohead - No Surprises", songs, limit=1, scorer=fuzz.token_sort_ratio))
# [('10 Radiohead - OK Computer - No Surprises.mp3', 70)]

The appeal of thefuzz library are the *ratio functions that will likely do a better job than the builtin difflib.get_close_matches or difflib.SequenceMatcher.ratio. The snippet above shows their different uses. First we use the basic ratio which computes a simple similarity score of two strings. After that we use token_sort_ratio which ignores the order of tokens (words) in the string when calculating the similarity. Finally, we test the token_set_ratio function, which instead ignores duplicate tokens.

We also use the extract function from process module which is an alternative to difflib.get_close_matches. This function looks for the best match(es) in a list of strings.

If you're already using difflib and are wondering if you should use thefuzz instead, then make sure to check out an article by the author of the library that nicely demonstrates why builtin difflib is not always sufficient and why the above functions might work better.

Debugging

There are also quite a few debugging and troubleshooting libraries that bring superior experience in comparison to what standard library has. One such library is stackprinter which brings more helpful version of Python's built-in exception messages:


# pip install stackprinter

import stackprinter
stackprinter.set_excepthook(style='darkbg2')

def do_stuff():
    some_var = "data"
    raise ValueError("Some error message")

do_stuff()

All you need to do to use it, is import it and set the exception hook. Then, running code that throws an exception will result in:

stackprinter

I think this is big improvement because it shows local variables and context - that is - things that you would need interactive debugger for. Check out docs for additional options, such as integration with logging or different color themes.

stackprinter helps with debugging issues that result in exceptions, but that's only a small fraction of issues we all debug. Most of the time troubleshooting bugs involves just putting print or log statements all over the code to see current state of variables or to see whether the code was run at all. And there's a library that can improve upon the basic print-style debugging:


# pip install icecream
from icecream import ic

def do_stuff():
    some_var = "data"
    some_list = [1, 2, 3, 4]
    ic()
    return some_var

ic(do_stuff())
# ic| examples.py:46 in do_stuff() at 11:27:44.604
# ic| do_stuff(): 'data'

It's called icecream and it provides ic function that serves as a print replacement. You can use plain ic() (without arguments) to test which parts of code were executed. Alternatively, you can use ic(some_func(...)) which will print the function/expression along with the return value.

For additional options and configuration check out GitHub README.

Testing

While on the topic of debugging, we should probably also mention testing. I'm not going to tell you to use other test framework then the builtin unittest (even though pytest is just better), instead I want to show you 3 little helpful tools:

First one is freezegun library, which allows you to mock datetime:


# pip install pytest freezegun
from freezegun import freeze_time
import datetime

# Run 'pytest' in shell
@freeze_time("2022-04-09")
def test_datetime():
    assert datetime.datetime.now() == datetime.datetime(2022, 4, 9)  # Passes!


def test_with():
    with freeze_time("Apr 9th, 2022"):
        assert datetime.datetime.now() == datetime.datetime(2022, 4, 9)  # Passes!


@freeze_time("Apr 9th, 2022", tick=True)
def test_time_ticking():
    assert datetime.datetime.now() > datetime.datetime(2022, 4, 9)  # Passes!

All you need to do is add decorator to the test function that sets the date (or datetime). Alternatively, you can also use it as a context manager (with statement).

Above you can also see that it allows you to specify the date in friendly format. And finally, you can also pass in tick=True which will restart time from the given value.

Optionally - if you're using pytest - you can also install pytest-freezegun for Pytest-style fixtures.

Second essential testing library/helper you need is dirty-equals. It provides helper equality functions for comparing things that are kind-of equal:


# pip install dirty-equals
from dirty_equals import IsApprox, IsNow, IsJson, IsPositiveInt, IsPartialDict, IsList, AnyThing
from datetime import datetime

assert 1.0 == IsApprox(1)
assert 123 == IsApprox(120, delta=4)  # close enough...

now = datetime.now()
assert now == IsNow  # just about...

assert '{"a": 1, "b": 2}' == IsJson
assert '{"a": 1}' == IsJson(a=IsPositiveInt)

assert {'a': 1, 'b': 2, 'c': 3} == IsPartialDict(a=1, b=2)  # Validate only subset of keys/values

assert [1, 2, 3] == IsList(1, AnyThing, 3)

Above is a sample of helpers that test whether two integers or datetimes are approximately the same; whether something is a valid JSON, including testing individual keys in that JSON; or whether value is a dictionary or a list with specific keys/values.

And finally, the third helpful library is called pyperclip - it provides functions for copying and pasting to/from clipboard. I find this very useful for debugging, e.g. to copy values of variables or error messages to clipboard, but this can have a lot of other use cases:


# pip install pyperclip
# sudo apt-get install xclip
import pyperclip
try:
    print("Do something that throws error...")
    raise SyntaxError("Something went wrong...")
except Exception as e:
    pyperclip.copy(str(e))

# CTRL+V -> Something went wrong...

In this snippet we use pyperclip.copy to automatically copy exception message into clipboard, so that we don't have to copy it manually from program output.

CLI

Last category that deserves a mention is CLI tooling. If you build CLI applications in Python, then you can probably put tqdm to good use. This little library provides a progress bar to your programs:


# pip install tqdm
from tqdm import tqdm, trange
from random import randint
from time import sleep

for i in tqdm(range(100)):
    sleep(0.05)  # 50ms per iteration

# 0%  |          | 0/100 [00:00<?, ?it/s]
# 100%|██████████| 100/100 [00:05<00:00, 19.95it/s]

with trange(100) as t:
    for i in t:
        t.set_description('Step %i' % i)
        t.set_postfix(throughput=f"{randint(100, 999)/100.00}Mb/s", task=i)
        sleep(0.05)

# Step 60:  60%|██████    | 60/100 [00:03<00:02, 19.78it/s, task=60, throughput=4.06Mb/s]

To use it we simply wrap a loop with tqdm and we get a progress bar in the program output. For more advanced cases you can use trange context manager and set additional options such as description or any custom progress bar fields, such as throughput or time elapsed.

The module can also be executed as a shell command (python -m tqdm), which could be useful e.g. when creating backup with tar or looking for files with find.

See docs for further advanced examples, as well as things like integrations with Pandas or Jupyter Notebook.

Closing Thoughts

With Python, you should always search for existing libraries before implementing anything yourself from scratch. Unless you're creating a particularly unusual or bespoke solution, chances are someone has already built and shared it on PyPI.

In this article I listed only general purpose libraries that anyone can benefit from, but there are many other specialized ones - e.g. for ML or web development - so I would recommend that you check out https://github.com/vinta/awesome-python which has very extensive list of interesting libraries, or you can also simply search PyPI by category and I'm sure you will find something useful there.