惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

SecWiki News
SecWiki News
I
InfoQ
The Cloudflare Blog
人人都是产品经理
人人都是产品经理
博客园 - Franky
T
Tailwind CSS Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
量子位
博客园_首页
罗磊的独立博客
V
V2EX
李成银的技术随笔
大猫的无限游戏
大猫的无限游戏
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
True Tiger Recordings
Vercel News
Vercel News
Cyberwarzone
Cyberwarzone
Cisco Talos Blog
Cisco Talos Blog
F
Fox-IT International blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
M
Microsoft Research Blog - Microsoft Research
Know Your Adversary
Know Your Adversary
爱范儿
爱范儿
The Register - Security
The Register - Security
G
Google Developers Blog
The Hacker News
The Hacker News
Malwarebytes
Malwarebytes
S
Securelist
博客园 - 三生石上(FineUI控件)
Jina AI
Jina AI
T
Threat Research - Cisco Blogs
T
The Exploit Database - CXSecurity.com
S
SegmentFault 最新的问题
博客园 - 叶小钗
F
Fortinet All Blogs
Apple Machine Learning Research
Apple Machine Learning Research
宝玉的分享
宝玉的分享
博客园 - 聂微东
T
Threatpost
博客园 - 【当耐特】
D
Docker
P
Privacy & Cybersecurity Law Blog
www.infosecurity-magazine.com
www.infosecurity-magazine.com
G
GRAHAM CLULEY
V
Visual Studio Blog
C
Cisco Blogs
IT之家
IT之家
S
Security Archives - TechRepublic
Latest news
Latest news
阮一峰的网络日志
阮一峰的网络日志

Mox的笔记库

细嗦下MLIR的环境搭建 | Mox的笔记库 博客重构:从Hexo到Astro | Mox的笔记库 2026PPoPP MLIR Tutorial学习 | Mox的笔记库 MacOS配置《明日方舟:终末地》 | Mox的笔记库 2025:向内生长 | Mox的笔记库 由mlir::ExecutionEngine引发的跨系统问题 | Mox的笔记库 WSL2配置Cuda-Tile环境记录(未完待续) | Mox的笔记库 Vibe Coding手搓项目记录 | Mox的笔记库 给Debian上包——以DuckDB为例 | Mox的笔记库 UCPD.sys事件存档 | Mox的笔记库 换新电脑之Mac mini M4从购买到配置 | Mox的笔记库 Mac配置MLX-C开发环境 | Mox的笔记库 RISC-V meets RDBMS——RISC-V架构上可运行数据库一览 | Mox的笔记库 DuckDB Sort实现调查 | Mox的笔记库 修复Redis在树莓派5上无法运行的问题 | Mox的笔记库 如何在MLIR中自定义类型并且输出运行 | Mox的笔记库 网站网络结构变更记录 | Mox的笔记库 EDBT25论文阅读:PhoebeDB——A Disk-Based RDBMS Kernel for High-Performance and Cost-Effective OLTP SIGMOD25论文阅读:BPF-DB:——A Kernel-Embedded Transactional Database Management System For eBPF Applications SIGMOD24文章阅读:Query Compilation Without Regrets | Mox的笔记库 论文阅读:Designing an Open Framework for Query Optimization and Compilation Apache Arrow Gandiva项目解析 | Mox的笔记库 VLDB24论文阅读:Cloud-Native Database Systems and Unikernels——Reimagining OS Abstractions for Modern Hardware NoisePage源码分析(未完待续) | Mox的笔记库 VLDB20论文阅读:Mainlining Databases——Supporting Fast Transactional Workloads on Universal Columnar Data File Formats VLDB17论文阅读:Relaxed Operator Fusion for In-Memory Databases:Making Compilation, Vectorization, and Prefetching Work Together At Last SIGMOD24阅读:ROME——Robust Query Optimization via Parallel Multi-Plan Execution 文章阅读:First Past the Post-Evaluating Query Optimization in MongoDB SIGMOD文章阅读:Apache Calcite——A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources VLDB23论文阅读:Analyzing the Impact of Cardinality Estimation on Execution Plans in Microsoft SQL Server SIGMOD22论文阅读:Efficient Massively Parallel Join Optimization for Large Queries VLDB论文阅读:Weaving Relations for Cache Performance VLDB22论文阅读:ConnectorX——Accelerating Data Loading From Databases to Dataframes 论文阅读:UniKraft-Fast, Specialized Unikernels the Easy Way 当DuckDB遇上RISC-V | Mox的笔记库 SIGMOD25论文阅读:An Elephant Under The Microscope——Analyzing The Interaction Of Optimizer Components In PostgreSQL 论文阅读:Compile-Time Analysis of Compiler Frameworks for Query Compilation VLDB23阅读:Bringing Compiling Databases to RISC Architectures LingoDB源码编译与分析 | Mox的笔记库 淦!MLIR输出Hello World不应该这么难! | Mox的笔记库 如何愉快的运行一个MLIR程序 | Mox的笔记库 2024:拥挤年代的想象与创造 | Mox的笔记库 如何给自己的博客添加MLIR和LLVM IR语法高亮 | Mox的笔记库 VLDB19-Parsing Gigabytes of JSON per Second论文阅读 CIDR25:Runtime-Extensible Parsers阅读 | Mox的笔记库 MLIR学习资料整理 | Mox的笔记库 SIGMOD24文章阅读:VeriTxn | Mox的笔记库 VLDB23文章阅读——Exploiting Cloud Object Storage for High-Performance Analytics VLDB24——OLAP on Modern Chiplet-Based Processors走马观花阅读 VLDB22:YeSQL文章阅读(已废弃) | Mox的笔记库 如何让数据库中的Python跑的更快-VLDB22-YeSQL文章阅读 | Mox的笔记库 你好,世界! | Mox的笔记库 让系统研究更有意义:HarmonyOS NEXT的教训和经验——讲座回顾 | Mox的笔记库 UNSW 24T3 COMP9336上课记录 | Mox的笔记库 Velox开发环境配置踩坑记录 | Mox的笔记库 MLIR Toy Tutorial实践记录 | Mox的笔记库 论文阅读:Declarative Sub-Operators for Universal Data Processing LLVM-Kaleidoscope实操踩坑记录 | Mox的笔记库 2024年7月RSSHub开发体验 | Mox的笔记库 澳洲大学计算机硕士比较 | Mox的笔记库 论文阅读——CDUL:CLIP-Driven Unsupervised Learning for Multi-Label Image Classification 论批量快速添加图片与视频水印的事 | Mox的笔记库 CVPR2023-CLIP算法调研 | Mox的笔记库 基于元信息写入的服务器压力测试 | Mox的笔记库 MjAyMw==,希望,前进与平庸之道 | Mox的笔记库 家庭组网IPv6+Mesh折腾 | Mox的笔记库 code-server初体验 | Mox的笔记库 从Nginx到Caddy | Mox的笔记库 Hexo部署安装全流程回顾 | Mox的笔记库 RMM观察与初探 | Mox的笔记库 计算机网络课设——UDP/TCP/TLS Socket实验 | Mox的笔记库 JQuery的XSS初探 | Mox的笔记库 生产实习记录 | Mox的笔记库 Fedora-CoreOS配置与试用(2023年) | Mox的笔记库 Electron学习笔记 | Mox的笔记库 ServerSentEvent学习 | Mox的笔记库 报告翻译:容器云的安全挑战 | Mox的笔记库 Arch Linux迁移计划 | Mox的笔记库 Vagrant配置Metarget靶场环境 | Mox的笔记库 OpenAI-whisper折腾 | Mox的笔记库 202202,困惑,混乱与未曾设想之路 | Mox的笔记库 2022年Hack the box:Tier1免费区全解 | Mox的笔记库 Navidrome部署记录 | Mox的笔记库 长安杯2021-snake复现 | Mox的笔记库 报告概要翻译:OBFUSCATING C++ PROGRAMS VIA CONTROL FLOW FLATTENING 从零开始的Django CVE-2022-28346复现 | Mox的笔记库 2022CISCN(西北区赛)-The shinning | Mox的笔记库 Docker+QEMU+Arm64(Ubuntu)+环境配置(2022版) | Mox的笔记库 Arch Linux运行树莓派系统(2022年) | Mox的笔记库 2022CISCN初赛-ez_usb-复盘WriteUp | Mox的笔记库 NodeMCU-MicroPython配置实录 | Mox的笔记库 Django事务使用 | Mox的笔记库 记录第一次EduSRC上报 | Mox的笔记库 Jetbrain问题应急处理 | Mox的笔记库 Celery5.2学习&配置 | Mox的笔记库 Waline部署记录 | Mox的笔记库 2021年12月 Vivo千镜杯回顾 | Mox的笔记库 Frida hook初次实战 | Mox的笔记库 Log4j2漏洞复现 | Mox的笔记库 Windows的WSL2+Docker初探 | Mox的笔记库
论文阅读:How not to structure your database-backed web applications——a study of performance bugs in the wild
2025-05-17 · via Mox的笔记库

最近ByteBase的公众号推送了这篇文章,就顺带看了下。文章发于2018年的ICDE

文章地址:https://dl.acm.org/doi/10.1145/3180155.3180194

研究ORM读取数据库的性能影响,有意思

Introduction

These ORM frameworks allow developers to program such database-backed web applications in a DBMS oblivious way, as the frameworks expose APIs for developers to operate persistent data stored in the DBMS as if they are regular heap objects, with regularlooking method calls transparently translated to SQL queries by frameworks when executed.

简要的介绍了下ORM

Given the above, we target three key research questions about real-world ORM applications in this paper:

• RQ 1: How well do real-world database-backed web applications perform as the amount of application data increases?

• RQ 2: What are the common root causes of performance and scalability issues in such applications?

• RQ 3: What are the potential solutions to such issues and can they be applied automatically?

问了几个问题,可以理解为ORM的使用情况怎么样?性能怎么样?有什么问题需要解决?

Overall, our comprehensive study provides motivations and guidelines for future research to help avoid, detect, and fix crossstack performance issues in ORM applications. We have prepared a detailed replication package for all the performance-issue study, profiling, and program analysis conducted in this paper. This package is available on the webpage of our Hyperloop project [16], a project that aims to solve database-related performance problems in ORM applications.

Mark下,有空再瞅瞅:http://hyperloop.cs.uchicago.edu.

Background

Our study focuses on applications written in Ruby on Rails (Rails). Ruby is among the top 3 popular languages on GitHub [38], and Rails is among the top 3 popular web application frameworks

啊,Ruby上Github前三?(想想是2018年,那确实有可能)——难怪江湖有不少关于Ruby的传说doge

这章主要介绍Ruby生态,图看看就好

image-20250516174256439

Profiling & Study Methodology

As mentioned in Section 2, we focus on Rails applications. Since it is impractical to study all open-source Rails applications (about 200 thousand of them on GitHub [12]), we focus on 6 popular application categories1 as shown in Table 1. These 6 categories cover 90% of all Rails applications with more than 100 stars on GitHub.

image-20250516175546549

以Github上的Ruby的6类项目为例(大家比较熟悉的应该有Gitlab和Discourse,OpenSteetmap是Ruby这个我真没想到)

To profile an ORM application, we need to populate its database. Without access to the database contents in the deployed applications, we collect real-world statistics of each application based on its public website

卧槽,真这么做?“填充数据库”,这个工作量应该非常大,下面是Gitlab的数据

image-20250516213602814

we examine each application’s bug-tracking system. For 6 applications that contain fewer than 1000 bug reports, as shown in Table 4, we manually check every bug report. For applications with 1000 to 5000 bug reports, we randomly sample 100 bug reports that have been fixed and contain the keywords performance, slow, or optimization. For Redmine and Gitlab, which have more than 10,000 bug reports, we sample 200 from them in the same way.

甚至还仔细查看了BugReport

Our study of each application’s bug-tracking system does not consider bug reports that are not fixed or not clearly explained. Despite these aspects, we have made our best effort in conducting a comprehensive and unbiased study

“我们对每个应用程序的错误跟踪系统的研究并未考虑未固定或未清楚解释的错误报告”emmm,好吧,也就是实验具备时效性,现在2025年未必是有效的

Profiling Results

We identify the 10 pages with the most loading time for every application under the 20,000-record database configuration and plot their average end-to-end page loading time in Figure 2.

image-20250516214944462

As shown in Figure 3, server time contributes to at least 40% of the end-to-end-latency for more than half of the top 10 pages in all but 1 application

image-20250516220100732

服务端在端到端延迟中占据了40%,这并不意外

但下面还有一段注释,我觉得有些难评价

Part of the server time could overlap with the client time or the network time. However, our measurement shows that the overlap is negligible.

实验环境下网络环境不是问题,但实际网络延迟不会导致ORM出现其他问题么?

Causes of inefficiences

After studying the 64 performance issues in the 40 problematic actions and the 140 issues reported in the applications’ bug-tracking systems, we categorize the inefficiency causes into three categories: ORM API misuses, database design, and application design.

这三个结论:

  1. ORM API misuses
  2. Database design
  3. Application design

说了跟没说一样?😂那得看看后面怎么分析了

ORM API misuses

这一块总结为几类

  • Inefficient Computation (IC)
  • Unnecessary Computation (UC)
  • Inefficient Data Accessing (ID)

Another common problem is developers using API calls that generate queries with unnecessary ordering of the results. For example, Ror, Diaspora, and Spree developers use Object.where(c).first to get an object satisfying predicate c instead of Object.find_by(c), not realizing that the former API orders Objects by primary key after evaluating predicate c

这种情况似乎并不意外?😂ORM总是能提供多种路径实现数据读取,可是使用的SQL就未必高效

image-20250516222004684

Sometimes, queries are repeatedly issued to load the same database contents and hence are unnecessary. For instance, Figure 5 shows the patch from redmine-23334.

image-20250516222402232

Summary. While similar issues in general purpose programs can be eliminated using classic compiler optimization techniques (e.g., loop invariant motion, dead-store elimination), doing so for ORM applications is difficult as it involves understanding database queries. We are unaware of any compilers that perform such transformations.

我是不是能这么理解,上述问题都可以通过优化ORM解决,但真想要优化ORM不太可能😅比如说臭名昭著的N+1问题

这一块基本就是列举ORM的各类错误使用样例,不过里面的”Inefficient rendering”是头一回听说,文章说是“IR reflects a trade-off between readability and performance when a view file renders a set of objects.”

image-20250516224353051

Although slow rendering is complained, such transformation has not yet been proposed by issue reports. Our profiling finds such optimization speeds up 5 problematic actions by 2.5× on average.

也就是说这个问题是可以被优化的?那还行

Database Design Problems

主要分为以下几点

  • Missing Fields (MF)

  • Missing Database Indexes (MI)

这些问题在数据库设计中不可避免存在,基本可以认为是数据库设计人员的锅

Application Design Trade-offs

  • Application Design Trade-offs

Unfortunately, the lack of pagination still widely exists in latest versions of ORM applications in our study. This indicates that ORM developers need database-aware performance-estimation support to remind them of the need to use pagination in webpage design.

“remind them of the need to use pagination in webpage design”——这其实就有些难蚌

  • Application Functionality Trade-offs (FT)

这段其实我没看懂,是想说为了ORM性能不得不放弃些功能?(不会吧😶)

Fixing The Inefficiences

讲修复效果的,情况以下面这张图为准

Figure 10(a) shows the amount of servertime speedup and the sources of the speedup broken down into different anti-patterns as discussed in Section 5.

image-20250516233927725

We have reported these 64 fixes to corresponding developers. So far, we have received developers’ feedback for 14 of them, all of which have been confirmed to be true performance problems and 7 have already been fixed based on our report.

怎么说,对社区贡献还是不错的

Figure 10(b) shows the lines of code changes required to implement the fixes. The biggest change takes 56 lines of code to fix (for an inefficient rendering (IR) anti-pattern), while the smallest change requires only 1 line of code in 27 fixes. More than 78% of fixes require fewer than 5 lines. In addition, among the fixes that improve performance by 3× or more, more than 90% of them take fewer than 10 lines of code. Around 60% of fixes are intra-procedural, involving only one function.

总之就是这些问题都可以通过找有经验的专家解决——那似乎还是Mybatis那种写SQL的ORM会更好些?

Finding more API Misuses

image-20250516234606308

Anyway,API Misuse还是广泛存在的

Discussion

  • Improving ORM APIs

    这没的说,硬要说的话铁定背锅

  • Support for design and development of ORM applications.

    当然还要教育开发者😂

  • Compiler and runtime optimizations

    这个难咯,想法很好,代价很高(你们为什么不选Java.jpg)

  • Generalizing to other ORM frameworks

    基本使用ORM都会有这个问题,用ORM的一个都别跑(为什么没有Hibernate和Mybatis.jpg)

image-20250516235447164

结语

一片很棒的实验文章😂如果文章是在写不出来可以来这里找找写作思路

Ruby On Rails在国内基本没市场,这些问题我反倒是在Django上有些感觉——说白了,还是ORM降低了后端人员的门槛导致的,要是选型Java这种范式成熟的很难想象会有这种问题

结尾贴一些ByteBase发的微信文章:ORM 越方便,数据库越慢?顶会论文列举九大反模式