惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Fox-IT International blog
Recent Announcements
Recent Announcements
D
Docker
IT之家
IT之家
B
Blog
Jina AI
Jina AI
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
博客园 - 【当耐特】
Google DeepMind News
Google DeepMind News
F
Fortinet All Blogs
量子位
C
Check Point Blog
Microsoft Azure Blog
Microsoft Azure Blog
罗磊的独立博客
博客园 - 司徒正美
李成银的技术随笔
美团技术团队
Blog — PlanetScale
Blog — PlanetScale
雷峰网
雷峰网
The GitHub Blog
The GitHub Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
J
Java Code Geeks
T
The Blog of Author Tim Ferriss
酷 壳 – CoolShell
酷 壳 – CoolShell
MongoDB | Blog
MongoDB | Blog
P
Proofpoint News Feed
L
LangChain Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Y
Y Combinator Blog
大猫的无限游戏
大猫的无限游戏
有赞技术团队
有赞技术团队
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
V
Visual Studio Blog
T
Tailwind CSS Blog
H
Help Net Security
Engineering at Meta
Engineering at Meta
小众软件
小众软件
B
Blog RSS Feed
Stack Overflow Blog
Stack Overflow Blog
月光博客
月光博客
M
Microsoft Research Blog - Microsoft Research
宝玉的分享
宝玉的分享
人人都是产品经理
人人都是产品经理
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
GbyAI
GbyAI
H
Hackread – Cybersecurity News, Data Breaches, AI and More
Last Week in AI
Last Week in AI
Martin Fowler
Martin Fowler
Stack Overflow Blog
Stack Overflow Blog

Secret Weblog

Becoming More Xee: A Modern XPath and XSLT Engine in Rust Looking for new challenges! Repeat Yourself, A Bit The Curious Case of Quentell The Humble For Loop in Rust The Humble For Loop in JavaScript Don't Look Down on Print Debugging Question Best Practices I Was a 1980s Teenage Programmer Part 5: Achieving Assembly I Was a 1980s Teenage Programmer Part 4: The Call of Assembly The Tooling Shift I Was a 1980s Teenage Programmer Part 3: MSX-2 JavaScript: when you need two ways to do it! Empowering Programming Languages Bloat and Retrofuturism Refreshing my Blog Again Random Rust Impressions Apilar: An Alife System I Was a 1980s Teenage Programmer Part 2: Olivetti M24 I Was a 1980s Teenage Programmer: the Alphatronic SolidJS fits my brain Is premature optimization the root of all evil? Framework Patterns: JavaScript edition Roll Your Own Frameworks Looking for new challenges Framework Patterns Secret Weblog Highlights Refactoring to Multiple Exit Points mstform: a form library for mobx-state-tree Seven Years: A Very Personal History of the Web Looking for new challenges Morepath 0.16 released! Is Morepath Fast Yet? Introducing Bob Strongpinion Punctuated Equilibrium in Software Morepath 0.15 released! Impressions of React Europe 2016 Morepath 0.14 released! Morepath 0.13 now with Dectate Dectate: advanced configuration for Python code JavaScript Dependencies Revisited: An Example Project The Incredible Drifting Cyber A Brief History of Reselect The Emerging GraphQL Python stack Thoughts about React Europe Build a better batching UI with Morepath and Jinja2 GraphQL and REST Server Templating in Morepath 0.10 10 reasons to check out the Morepath web framework in 2015 A Review of the Web and how Morepath fits in Morepath 0.9 released! Better REST with Morepath 0.8 Morepath 0.7: new inter-app linking They say something I don't like so they must be lying! Life at the Boundaries: Conversion and Validation BowerStatic 0.4 released! Morepath 0.6 released! Morepath 0.5(.1) and friends released! New HTTP 1.1 RFCs versus WSGI Against On Naming In Open Source My visit to EuroPython 2014 Morepath 0.4.1 released (with Python 3 fixes) Morepath 0.4 and breaking changes Announcing BowerStatic Morepath 0.3 released! Morepath 0.2 Morepath Python 3 support The Call of Python 2.8 Morepath 0.1 released! WebOb and Werkzeug compared Morepath: from Werkzeug to WebOb Racing the Morepath: SQLAlchemy Integration The Centre Cannot Hold Breaking Morepath Changes Morepath Update How to do REST with Morepath Morepath Security the Gravity of Python 2 #python2.8 discussion channel on freenode Alex Gaynor on Python 3 Morepath Documentation Starting to Take Shape Back to the Center Morepath App Reuse Implementing Grok Grok: the Idea Why Linux Works for Me On the Morepath Reg, Now With More Generic! The New Zope as a Web Framework Jim Fulton, Zope Architect Renewing Zope Object Publishing The Weirdness of Zope The Rise of Zope My Exit from Zope Reg: Component Architecture Reimagined JSConf EU 2013 impressions Obviel 1.0!
benchmarks and lxml
Martijn Faassen · 2005-01-24 · via Secret Weblog

The recent cElementTree release is causing some waves in the Python/XML community. It started when Uche Ugbuji posted The Python Community has too many deceptive XML benchmarks to his blog.

The effbot was not amused, as could be witnessed by his comment on it, and the blog entries:

http://online.effbot.org/2005_01_01_archive.htm#sigh http://online.effbot.org/2005_01_01_archive.htm#faking-it http://online.effbot.org/2005_01_01_archive.htm#faking-it-2 http://online.effbot.org/2005_01_01_archive.htm#faking-it-3

The problem is that Uche unwittingly introduced a benchmark that is rather.. deceptive. He has been testing the time taken by the whole program, including startup and shutdown of the Python interpreter, module importing, and the like, instead of the part where XML processing takes place. Unless you're writing command line scripts or classic CGI web applications, Python startup time is hardly relevant, and shouldn't be part of the measurement.

A while back while developing lxml.etree I was curious what benchmark Fredrik was using. I couldn't find the information on the web, but he told me when I mailed him about it. He was using the simple, obvious strategy which I myself had already been using:

.. imports ..
start = time.time() # time.clock() on windows
.. do the actual work ..
end = time.time()
print end - start

To measure approximate memory usage, he puts in a pause in the program before and after the processing, and checks the process overview on his machine manually.

I've replicated his results with cElementTree and ElementTree fairly well, though my machine is a bit different in its performance characteristics due to platform differences. See other blog entries for more info on this.

For fun, I thought I'd try Uche's benchmark against lxml.etree on this machine. I've also tested it against cElementTree (an older version, I can't keep up with Fredrik's releases; hm, no __version__ string I can find, so don't know what 0.9.x version it is.. reminds me to add one to lxml when the time comes for a release..).

Here's Uche's program adjusted for etree. As you can see, only the import statement needs to change:

import lxml.etree as ElementTree

tree = ElementTree.parse("ot.xml")
for v in tree.findall("//v"):
    text = v.text
    if text.find(u'begat') != -1:
        print text

I've also rewritten it to use xpath instead:

from lxml.import etree as ElementTree

tree = ElementTree.parse("ot.xml")
for text in tree.xpath("//v[contains(., 'begat')]/text()"):
    print text

Since this program is printing stuff, and printing overhead can be large, I've tried a number of tests:

  1. Unix 'time' command, print to stdout on Gnome terminal
  2. Unix 'time' command, redirect output to file
  3. time.time(), print to stdout on Gnome terminal
  4. time.time(), redirect output to file

Here are the results:

A      B      C      D
--------------------------
cElementTree      1.06s  0.32s  0.9s   0.23s
lxml.etree        1.2s   0.43s  1.1s   0.36s
lxml.etree xpath  0.53s  0.25s  0.42s  0.17s

As you can see from the results, the type of terminal you're printing to matters a lot. In case of the xpath tests, almost half of the time is spent printing to the terminal, and for the other tests the overhead seems to be even more.

Also note that at last I can claim a minor victory over cElementTree on my machine on this particular test! lxml.etree, when using xpath to do the task set, is faster than this version of cElementTree. Of course most of the credit here goes to libxml2's blazingly fast xpath implementation here.

All this shows benchmarks are nice as there are so many to choose from.