惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

S
Secure Thoughts
罗磊的独立博客
T
The Blog of Author Tim Ferriss
人人都是产品经理
人人都是产品经理
博客园 - 叶小钗
Last Week in AI
Last Week in AI
美团技术团队
Google Online Security Blog
Google Online Security Blog
Application and Cybersecurity Blog
Application and Cybersecurity Blog
D
Docker
G
Google Developers Blog
大猫的无限游戏
大猫的无限游戏
酷 壳 – CoolShell
酷 壳 – CoolShell
小众软件
小众软件
月光博客
月光博客
L
LINUX DO - 最新话题
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
W
WeLiveSecurity
H
Heimdal Security Blog
Vercel News
Vercel News
SecWiki News
SecWiki News
Forbes - Security
Forbes - Security
Blog — PlanetScale
Blog — PlanetScale
Google DeepMind News
Google DeepMind News
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
www.infosecurity-magazine.com
www.infosecurity-magazine.com
TaoSecurity Blog
TaoSecurity Blog
T
Troy Hunt's Blog
A
About on SuperTechFans
C
Check Point Blog
S
Security Affairs
Hacker News - Newest:
Hacker News - Newest: "LLM"
AI
AI
WordPress大学
WordPress大学
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
Help Net Security
Help Net Security
博客园_首页
The Last Watchdog
The Last Watchdog
S
SegmentFault 最新的问题
Hugging Face - Blog
Hugging Face - Blog
Security Archives - TechRepublic
Security Archives - TechRepublic
Engineering at Meta
Engineering at Meta
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
I
Intezer
K
Kaspersky official blog
M
MIT News - Artificial intelligence
J
Java Code Geeks
G
GRAHAM CLULEY
P
Palo Alto Networks Blog

桑弧蓬矢射四方

工作总结 6 Retro of Q1 2026 [书评]《硅谷Python工程师面试指南:数据结构、算法与系统设计》 Retro of RIF 父母来美机场 checklist USA印象22: 德州海钓记 工作总结 5 [书评]《Generative AI with Amazon Bedrock》 2025规划更新 [书评] 推荐《大规模语言模型:从理论到实践》
NOT IN vs LEFT ANTI JOIN: A Performance Comparison
桑弧蓬矢射四方 · 2025-12-27 · via 桑弧蓬矢射四方

When filtering data based on exclusion criteria, the choice between NOT IN and LEFT ANTI JOIN can significantly impact query performance. This post demonstrates why LEFT ANTI JOIN is typically the better choice.

< Revised and generated with help of Claude >

Original Approach (Inefficient)

SELECT product_id, product_category
FROM products_dim
WHERE region_id = 100
    AND product_id NOT IN (
        SELECT product_id
        FROM products_dim
        WHERE region_id = 200
    )
    AND product_category IS NOT NULL
SELECT a.product_id, a.product_category
FROM products_dim a
LEFT ANTI JOIN (
    SELECT DISTINCT product_id
    FROM products_dim
    WHERE region_id = 200
) b ON a.product_id = b.product_id
WHERE a.region_id = 100
    AND a.product_category IS NOT NULL

Why This Works

Both queries return exactly the same result: products from region 100 that don’t exist in region 200.

Key Differences

Aspect NOT IN LEFT ANTI JOIN
Performance Slower, less optimized Faster, better optimized by Spark
Broadcast Risk Can trigger unwanted broadcasts Better control, prevents large broadcasts
Execution Plan Subquery execution Efficient join strategy
NULL Handling Unpredictable with NULLs Predictable behavior

Bottom Line

LEFT ANTI JOIN prevents broadcast errors while delivering the same results faster. When working with large datasets, this optimization can make a substantial difference in query execution time and resource utilization.