惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

K
Kaspersky official blog
Martin Fowler
Martin Fowler
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
V
Visual Studio Blog
博客园_首页
Engineering at Meta
Engineering at Meta
The Cloudflare Blog
MongoDB | Blog
MongoDB | Blog
Blog — PlanetScale
Blog — PlanetScale
T
The Blog of Author Tim Ferriss
雷峰网
雷峰网
D
Docker
博客园 - 司徒正美
S
SegmentFault 最新的问题
M
MIT News - Artificial intelligence
博客园 - 叶小钗
博客园 - 三生石上(FineUI控件)
U
Unit 42
J
Java Code Geeks
A
About on SuperTechFans
N
Netflix TechBlog - Medium
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
S
Security Affairs
I
Intezer
Cisco Talos Blog
Cisco Talos Blog
C
Cyber Attacks, Cyber Crime and Cyber Security
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
B
Blog RSS Feed
P
Privacy & Cybersecurity Law Blog
T
Tenable Blog
T
Threatpost
H
Hacker News: Front Page
G
Google Developers Blog
博客园 - 【当耐特】
Hugging Face - Blog
Hugging Face - Blog
Apple Machine Learning Research
Apple Machine Learning Research
L
Lohrmann on Cybersecurity
大猫的无限游戏
大猫的无限游戏
Google DeepMind News
Google DeepMind News
A
Arctic Wolf
S
Secure Thoughts
GbyAI
GbyAI
NISL@THU
NISL@THU
S
Security @ Cisco Blogs
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
Webroot Blog
Webroot Blog
C
CXSECURITY Database RSS Feed - CXSecurity.com
O
OpenAI News
Spread Privacy
Spread Privacy
Application and Cybersecurity Blog
Application and Cybersecurity Blog

桑弧蓬矢射四方

工作总结 6 Retro of Q1 2026 [书评]《硅谷Python工程师面试指南:数据结构、算法与系统设计》 Retro of RIF 父母来美机场 checklist USA印象22: 德州海钓记 工作总结 5 [书评]《Generative AI with Amazon Bedrock》 2025规划更新 [书评] 推荐《大规模语言模型:从理论到实践》
NOT IN vs LEFT ANTI JOIN: A Performance Comparison
桑弧蓬矢射四方 · 2025-12-27 · via 桑弧蓬矢射四方

When filtering data based on exclusion criteria, the choice between NOT IN and LEFT ANTI JOIN can significantly impact query performance. This post demonstrates why LEFT ANTI JOIN is typically the better choice.

< Revised and generated with help of Claude >

Original Approach (Inefficient)

SELECT product_id, product_category
FROM products_dim
WHERE region_id = 100
    AND product_id NOT IN (
        SELECT product_id
        FROM products_dim
        WHERE region_id = 200
    )
    AND product_category IS NOT NULL
SELECT a.product_id, a.product_category
FROM products_dim a
LEFT ANTI JOIN (
    SELECT DISTINCT product_id
    FROM products_dim
    WHERE region_id = 200
) b ON a.product_id = b.product_id
WHERE a.region_id = 100
    AND a.product_category IS NOT NULL

Why This Works

Both queries return exactly the same result: products from region 100 that don’t exist in region 200.

Key Differences

Aspect NOT IN LEFT ANTI JOIN
Performance Slower, less optimized Faster, better optimized by Spark
Broadcast Risk Can trigger unwanted broadcasts Better control, prevents large broadcasts
Execution Plan Subquery execution Efficient join strategy
NULL Handling Unpredictable with NULLs Predictable behavior

Bottom Line

LEFT ANTI JOIN prevents broadcast errors while delivering the same results faster. When working with large datasets, this optimization can make a substantial difference in query execution time and resource utilization.