惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
WordPress大学
WordPress大学
小众软件
小众软件
Cloudbric
Cloudbric
AWS News Blog
AWS News Blog
腾讯CDC
量子位
人人都是产品经理
人人都是产品经理
大猫的无限游戏
大猫的无限游戏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
V
Vulnerabilities – Threatpost
Scott Helme
Scott Helme
Hugging Face - Blog
Hugging Face - Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
The Hacker News
The Hacker News
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
IT之家
IT之家
Jina AI
Jina AI
Attack and Defense Labs
Attack and Defense Labs
S
SegmentFault 最新的问题
Simon Willison's Weblog
Simon Willison's Weblog
The Cloudflare Blog
阮一峰的网络日志
阮一峰的网络日志
T
Tailwind CSS Blog
Last Week in AI
Last Week in AI
博客园 - 【当耐特】
Google Online Security Blog
Google Online Security Blog
美团技术团队
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
罗磊的独立博客
L
LINUX DO - 最新话题
博客园 - Franky
博客园 - 叶小钗
Apple Machine Learning Research
Apple Machine Learning Research
The Last Watchdog
The Last Watchdog
J
Java Code Geeks
AI
AI
C
Cisco Blogs
酷 壳 – CoolShell
酷 壳 – CoolShell
C
Cyber Attacks, Cyber Crime and Cyber Security
Cisco Talos Blog
Cisco Talos Blog
博客园 - 三生石上(FineUI控件)
雷峰网
雷峰网
Help Net Security
Help Net Security
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
云风的 BLOG
云风的 BLOG
I
Intezer
S
Securelist

Martin Heinz's Blog

A Guide to Python's Weak References Using weakref Module Recent Docker BuildKit Features You're Missing Out On Modern Git Commands and Features You Should Be Using Everything You Can Do with Python's textwrap Module Monitoring Indoor Air Quality with Prometheus, Grafana and a CO2 Sensor You Don't Need a Dedicated Cache Service - PostgreSQL as a Cache A Collection of Docker Images To Solve All Your Debugging Needs Weird Python "Features" That Might Catch You By Surprise Lessons Learned From Writing 100 Articles Debugging Crashes and Deadlocks in Python using PyStack Goodbye etcd, Hello PostgreSQL: Running Kubernetes with an SQL Database Remote Interactive Debugging of Python Applications Running in Kubernetes The Right Way to Run Shell Commands From Python Real Multithreading is Coming to Python - Learn How You Can Use It Now Python's Missing Batteries: Essential Libraries You're Missing Out On Kubernetes-Native Synthetic Monitoring with Kuberhealthy Make Your CLI Demos a Breeze with Zero Stress and Zero Mistakes Reduce - The Power of a Single Python Function Why I Will Never Use Alpine Linux Ever Again Cgroups - Deep Dive into Resource Management in Kubernetes Dictionary Dispatch Pattern in Python Boost Your Python Application Performance using Continuous Profiling Lazy Evaluation Using Recursive Python Generators Python Magic Methods You Haven't Heard About Getting Started with Mastodon API in Python Backup-and-Restore of Containers with Kubernetes Checkpointing API Getting Started with Google APIs in Python Python CLI Tricks That Don't Require Any Code Whatsoever All The Ways To Introspect Python Objects at Runtime What is Python's "self" Argument, Anyway? Python List Comprehensions Are More Powerful Than You Might Think You Should Be Using Python's Walrus Operator - Here's Why Recipes and Tricks for Effective Structural Pattern Matching in Python It's Time to Say Goodbye to These Obsolete Python Libraries Advanced Features of Kubernetes' Horizontal Pod Autoscaler Data and System Visualization Tools That Will Boost Your Productivity Stop Messing with Kubernetes Finalizers Automate All the Boring Kubernetes Operations with Python End-to-End Monitoring with Grafana Cloud with Minimal Effort Bitly | bit.ly/3JLmSgA Bitly | bit.ly/3uETfbi Ultimate CI Pipeline for All of Your Python Projects Bitly | bit.ly/3M30D82 Bitly | bit.ly/3oMJ6qR Bitly | bit.ly/3IRD7IK Bitly | bit.ly/3A3B69t Profiling and Analyzing Performance of Python Programs Bitly | bit.ly/30uviIM Bitly | bit.ly/3E1X2mw Bitly | bit.ly/3Dv7JxP Bitly | bit.ly/3GG1BEz Bitly | bit.ly/3lLavs4 Bitly | bit.ly/39TqP3m Bitly | bit.ly/3A5Mpx8 Bitly | bit.ly/3kGwPl4 Bitly | bit.ly/3iHtulU Bitly | bit.ly/3xGjtKS Bitly | bit.ly/3h8DZg0 Bitly | bit.ly/2RQn1dG Bitly | bit.ly/3p2B5wW Bitly | bit.ly/3tULpb0 Bitly | bit.ly/2PHVudx Bitly | bit.ly/3uPtnb0 Bitly | bit.ly/3dg3QR9 Bitly | bit.ly/3qHtSkZ Bitly | bit.ly/3kIkTPr Bitly | bit.ly/3qlRAUN Bitly | bit.ly/3pCUJ26 Hardening Docker and Kubernetes with seccomp Bitly | bit.ly/34ZhIMt Bitly | bit.ly/3qSO7h0 Bitly | bit.ly/3muGLOk Bitly | bit.ly/35xN79v Bitly | bit.ly/3mLGshK Bitly | bit.ly/2IvkGQl Bitly | bit.ly/2Sk1KFK Bitly | bit.ly/3iCNIL6 Bitly | bit.ly/3beQPpy Saving Your Linux Machine from Certain Death New Features in Python 3.9 You Should Know About Deploy Any Python Project to Kubernetes Analyzing Docker Image Security Recursive SQL Queries with PostgreSQL Automating Every Aspect of Your Python Project Tour of Python Itertools Implementing 2D Physics in Javascript Ultimate Setup for Your Next Python Project Making Python Programs Blazingly Fast Security and Cryptography Mistakes You Are Probably Doing All The Time Going Serverless with OpenFaaS and Golang - Building Optimized Templates Going Serverless with OpenFaaS and Golang - The Ultimate Setup and Workflow Setting Up Swagger Docs for Golang API Building RESTful APIs in Golang Pytest Features, That You Need in Your (Testing) Life Setting up GitHub Package Registry with Docker and Golang Ultimate Setup for Your Next Golang Project Python Tips and Trick, You Haven't Already Seen, Part 2. Tricks for Postgres and Docker that will make your life easier Getting The Most Out of Reading Books - Reading The "Professional Way" Python Tips and Trick, You Haven't Already Seen
Everything You Can Do with Python's bisect Module
Martin · 2023-11-09 · via Martin Heinz's Blog

While Python's bisect module is very simple - containing really just 2 functions - there's a lot one can do with it, including searching data efficiently, keeping any data sorted, and much more - and in this article we will explore all of it!

What is Bisect(ion)?

Before we start playing with the module, let's first explain what bisect(ion) actually is. An official definition:

Bisection is the division of a given curve, figure, or interval into two equal parts (halves).

Which in plain English means that it implements a binary search. In practice that means that we can use it to - for example - insert elements into a list while maintaining the list in sorted order:


import bisect

some_list = [0, 6, 1, 5, 8, 2]
some_list.sort()
print(some_list) # [0, 1, 2, 5, 6, 8]

i = bisect.bisect_left(some_list, 4)
print(i)  # 3

some_list.insert(i, 4)
print(some_list)  # [0, 1, 2, 4, 5, 6, 8]
# OR
bisect.insort_left(some_list, 4)
print(some_list)  # [0, 1, 2, 4, 5, 6, 8]

In this basic example, we first sort a list because we can only use functions from bisect on sorted iterable. We then use bisect_left on the list to find an index where the second argument (4) should be inserted to maintain sorted order. We then proceed to do just that - insert the number 4 at index 3. Alternatively, we can directly use insort_left function which first uses bisect_left internally and then does the insert too.

Well, that's cool, but why should you care about this module, though? Well, let me show you all the useful things you can do with it...

Binary Search

As already mentioned, bisect implements binary search so the most obvious use for it is just that:


from bisect import bisect_left

def binary_search(a, x, lo=0, hi=None):
    if hi is None:
        hi = len(a)
    pos = bisect_left(a, x, lo, hi)  # find insertion position
    return pos if pos != hi and a[pos] == x else -1  # don't walk off the end

print(binary_search([0, 1, 2, 5, 6, 8], 5))  # 3
print(binary_search([0, 1, 2, 5, 6, 8], 4))  # -1

The parameters of binary_search function above follow the same pattern as the functions in bisect module. That is - we look for value x in the list a between index lo and hi.

The only interesting line is the return statement, where we test whether the value x is actually in the list, if yes we return its position, otherwise we return -1.

Successive Equal Values

There are however more interesting things we can do with bisect module, for example finding successive equal values in a list:


from bisect import bisect_left, bisect_right

some_list = [5, 10, 15, 15, 15, 20, 25, 40]
# Find all the 15's
value = 15
start = bisect_left(some_list, value)
end = bisect_right(some_list, value)
print(f'Successive values of {value} from index {start} to {end}: {some_list[start:end]}')
# Successive values of 15 from index 2 to 5: [15, 15, 15]

We've already seen bisect_left, so here we also introduce bisect_right which does the same thing, but from the other end. This way we can locate both start and end of the span of successive values we're looking for.

Mapping from Intervals to Values

Now, let's imagine that we have a series of intervals/ranges, and we want to return corresponding ID/value. A naive, ugly solution could look something like:


def interval_to_value(val):
    if val <= 100:
        return 0
    elif 100 < val <= 300:
        return 1
    elif 300 < val <= 500:
        return 2
    elif 500 < val <= 800:
        return 3
    elif 800 < val <= 1000:
        return 4
    elif val > 1000:
        return 5

But, there's a much nicer solution using bisect_left:


def interval_to_value(val):
    return bisect_left([100, 300, 500, 800, 1000], val)

This isn't just very clean solution but also super fast. It can be also extended in case you'd need a non-natural ordering or, for example, if you wanted to return something different, like a string:


i = interval_to_value(325)
a = ['absent', 'low', 'average', 'high', 'very high', 'extreme']
print(a[i])  # average

Closest Key in Dictionary

Now, let's say we have a mapping in form of a dictionary, and we want to lookup values for a specified key. If the key exist then cool, but if it doesn't, then we want to return the value of the closest key:


import collections
some_dict = collections.OrderedDict(
    [(0, 0), (2, 1), (4, 4), (6, 9), (8, 16), (10, 25), (12, 36), (14, 49), (16, 64), (18, 81)]
)

target = 10.5
index = bisect_left(list(some_dict.keys()), target)  # 6

items = list(some_dict.items())

# Check which one is closer:
print(f'Distance for to index {index}: {distance1}')    # Distance for to index 6: 1.5
print(f'Distance for to index {index-1}: {distance2}')  # Distance for to index 5: 0.5

print('Closest value:')
if distance1 < distance2:
    print(items[index])
else:
    print(items[index-1])

# Closest value: (10, 25)

Here we use OrderedDict to make sure that we have the keys in correct order. We then use bisect_left on them to find insertion point. Finally, we need to check whether the next or the previous index is closer to the target.

Prefix Search

Another thing you can use bisect for is prefix search - let's assume we have a very large word list and want to lookup words based on a given prefix:


def prefix_search(wordlist, prefix):
    try:
        index = bisect_left(wordlist, prefix)
        return wordlist[index].startswith(prefix)
    except IndexError:
        return False


words = ['another', 'data', 'date', 'hello', 'text', 'word']

print(prefix_search(words, 'dat'))  # True
print(prefix_search(words, 'xy'))  # False

The function above only check whether a word with specified prefix exists in the list, but this could be easily modified to loop from the index and return all the words starting with prefix.

If you have large enough word list, using bisect becomes much faster in comparison to just iterating the list from the start.

Sorted Custom Objects

So far we've only used built-in types, but functions from bisect module can be also applied to custom types. Let's say that we have a list of custom objects, and we want to maintain their order in the list based on some attribute:


from bisect import insort_left

class CustomObject:
    def __init__(self, val):
        self.prop = val  # The value to compare

    def __lt__(self, other):
        return self.prop < other.prop

    def __repr__(self):
        return 'CustomObject({})'.format(self.prop)

some_objects = sorted([CustomObject(7), CustomObject(1), CustomObject(3), CustomObject(9)])

insort_left(some_objects, CustomObject(2))
print(some_objects)  # [CustomObject(1), CustomObject(2), CustomObject(3), CustomObject(7), CustomObject(9)]

This code snippet uses the fact that bisect uses __lt__ magic method to compare objects. However, having to use bisect functions all the time might be a bit inconvenient, to avoid that you could implement a SortedCollection described in this more complete example/recipe.

Key Function

Functions in bisect module also support more complicated comparison/search using the the key function parameter:


some_list = [1, 3, 7, 16, 25]
some_list.reverse()
insort_left(some_list, 10, key=lambda x: -1 * x)
print(some_list)  # [25, 16, 10, 7, 3, 1]

Here we use the key function to implement a reverse order binary search, just remember that the list also has to be sorted in reverse order to begin with.

Similarly to reverse ordering, one might be also inclined to use key function for searching list of tuples, it's however possible to do just:


list_of_tuples = [(1, 3), (3, 8), (5, 4), (10, 12)]

index = bisect_left(list_of_tuples, (5, ))  # 2
print(list_of_tuples[2])  # (5, 4)

Omitting the second value in the tuple forces bisect_left to compare only based on the first value. If you wanted to be explicit and use the key function anyway, then key=lambda i: i[0] would work too.

With that said, the key function is somewhat unintuitive - one would expect it to work like e.g. sorted function:


some_tuples = [
    (0, 10),
    (2, 12),
    (3, 15),
    (5, 20),
]

print(sorted(some_tuples, key=lambda t: t[0] + t[1]))  # Works

But it doesn't:


index = bisect_left(some_tuples, (4, 17), key=lambda t: t[0] + t[1])  # Doesn't work
# Expectation: index = 3
# Reality: "TypeError: '<' not supported between instances of 'int' and 'tuple'"

def key_func(t):
    return t[0] + t[1]

index = bisect_left(some_tuples, key_func((4, 17)), key=key_func)
print(index)  # 3

Instead, we have to define a key function, pass it as a key argument and invoke it on the second argument, too.

That's because (from docs):

key specifies a key function of one argument that is used to extract a comparison key from each element in the array. To support searching complex records, the key function is not applied to the x value.

Also see this GitHub issue in cpython repository for reasoning behind this design decision.

Conclusion

In my opinion, for such a tiny module, that's a lot of things you can use it for.

Also, besides all the above useful things you can do with bisect, I want to highlight how fast this is - especially if you have list that's already sorted. For example consider this example on Stack Overflow showing that bisect_left is much faster than in operator.

It being binary search naturally means that it runs in O(log(n)), but it's further helped by being precompiled in C, so it's always going to be faster than anything you write yourself.