惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Help Net Security
Help Net Security
G
Google Developers Blog
雷峰网
雷峰网
WordPress大学
WordPress大学
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Engineering at Meta
Engineering at Meta
Security Latest
Security Latest
T
Threat Research - Cisco Blogs
AWS News Blog
AWS News Blog
F
Full Disclosure
C
Cybersecurity and Infrastructure Security Agency CISA
T
The Exploit Database - CXSecurity.com
J
Java Code Geeks
U
Unit 42
C
Cyber Attacks, Cyber Crime and Cyber Security
V
V2EX
C
Cisco Blogs
博客园 - 司徒正美
Project Zero
Project Zero
L
LINUX DO - 热门话题
阮一峰的网络日志
阮一峰的网络日志
Blog — PlanetScale
Blog — PlanetScale
Scott Helme
Scott Helme
A
About on SuperTechFans
Hugging Face - Blog
Hugging Face - Blog
S
Securelist
小众软件
小众软件
aimingoo的专栏
aimingoo的专栏
S
Schneier on Security
G
GRAHAM CLULEY
酷 壳 – CoolShell
酷 壳 – CoolShell
Cyberwarzone
Cyberwarzone
MongoDB | Blog
MongoDB | Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
博客园 - 叶小钗
T
Threatpost
Recorded Future
Recorded Future
C
CXSECURITY Database RSS Feed - CXSecurity.com
宝玉的分享
宝玉的分享
N
News and Events Feed by Topic
人人都是产品经理
人人都是产品经理
The Register - Security
The Register - Security
S
Security Archives - TechRepublic
博客园 - Franky
N
News | PayPal Newsroom
Simon Willison's Weblog
Simon Willison's Weblog
S
SegmentFault 最新的问题
W
WeLiveSecurity
A
Arctic Wolf
B
Blog

博客园 - Shicai Yang

Computer Vision Tutorials from Conferences (3) -- CVPR Computer Vision Tutorials from Conferences (2) -- ECCV Computer Vision Tutorials from Conferences (1) -- ICCV PhD Positions opening at University of Nevada, Reno (Wireless Networking / Cognitive Radio / Wireless Security) Best of Best系列(6)——SIGIR Best of Best系列(5)——IJCAI Best of Best系列(4)——AAAI Best of Best系列(3)——ICML Best of Best系列(2)——ICCV Best of Best系列(1)——CVPR 美国内华达大学招收计算机视觉方向MS/PhD学生,提供RA职位,每月1600$ Win7 64bit下MexOpenCV的安装,Matlab和C++&OpenCV的完美结合 Windows SDK 7.1 安装 Deep Learning at NIPS2012 Some Tips on Reading Research Papers PCCS-RGB变换表 21世纪初最有影响力的30篇计算机视觉会议论文 Some Interesting Papers from NIPS 2012 21世纪初最有影响力的20篇计算机视觉期刊论文
Comics: do we know that we are not doing research in the wrong way?
Shicai Yang · 2012-12-31 · via 博客园 - Shicai Yang

by Songchun Zhu, UCLA

From a talk given at the Frontiers of Vision in 2011. Download the whole ppt.

1, How to reach the moon?

Vision is arguably one of the most challenging, and potentially useful, problem in modern science and engineering for its enormous complexity in knowledge representation, learning and the computing mechanisms of the biologic systems. For such a complex problem, we must look for a long term solution, and be cautious that many apparently promising ways may lead to dead ends. By analogy, suppose some monkeys want to reach the moon, they may choose to (1) climb a tree, indeed, a tree could be so tall that a monkey climbs diligently for a life time, (2) grab the moon from a well at night time, or (3) ride a hot air balloon! All these methods appear to be smart and actually very cute, and people can enjoy measurable progress over time! while the real solution (building a spacecraft) looks hopeless for a long time and appears to be totally ridiculous to ordinary eyes ! In reality, most monkeys simply do not have the patience to learn astrophysics and rocket science, which are too complex and boring for them.

------------------------------------ Comic illustrated by my daughter Stephanie Zhu (11 yrs old drawn in 2010): How to reach the moon.

Reaching_the_Moon_by_Stephanie

2, Is vision a classification problem solvable by machine learning?

Some students asked me whether vision is just an application area of machine learning (which currently means training Boosting or SVM classifiers with large number of examples) as it appears to them. If so, what left for vision researchers is just to design good features. Such question is a real insult to vision and reflects the misleading research trend that poses vision as simple as a classification problem. This is no longer surprising to me, as the young generation not only never heard of Ulf Grenander (the father of pattern theory), but now didn't know who David Marr (father of computational vision) was. By analogy, machine learning, with its popular meaning, is very much like the method practiced by Chinese herbal clinics over the past three thousand years. Ancient people, who had little knowledge of modern medicine, tried on 100s of materials (roots, seeds, shells, worms, insects, etc) just like vision people test on various features. These ingredients are mixed with weights and boiled to black and bitter soup as drugs. It is believed that such soup can cure all illness including cancer, SARS and H1N1 flu. All you need is to find the right ingredients and mix them in the right proportion (weights). In theory, actually you can prove that this is true (essentially, modern medicine mixes some sort of ingredients as well), just like machine learning methods are guaranteed to solve all problems if they have enough features and examples, according to statistical theory! But the question is: with the space of ingredients so large, how do we find the right ingredients effectively? For vision, we got to study the complex structures of the images, the rich spaces and their compositions, and the variety of models and representations.

--------------------------------------- Comic: the analogy between Chinese herbal clinic and machine learning (by a painter Kun Deng and me drawed in 2008).

  Chinese_herb_clinic

3, Hack, Math and Stat: methods and stages in vision and other sciences.

Rresearch methodologies in vision (and other sciences and engineering) can be summarized in three approaches or stages: Hack, Math, and Stat. Hacks are heuristics or somethings that somehow work somewhere, but you cannot tell exactly how and where they work. Math is on the opposite side, it tells us that under certain conditions, things can be said analytically or with a gaurantee of performance, but often the conditions are limited and do not apply to general situations in the real world. Stat is essentially regression. With lots of parameters, you eventually can fit any data, but lack a physical explanation. Hack, Math, and stat are therefore different interpretations or models. It is interesting to see examples in physics. Tthe Chinese expedition [1405-1433] in the Ming dynasty was the most advanced of its time when folks sailed 2/3 of the world reaching Africa and Europe without even knowing the Earth is round. The technique they used is called celestial navigation, see the picture below. People used the constalletion to find the north and the latitude. This is very much like features we are using today in object recognition. It was not precise, but worked to some extent. A beautiful math theory appeared in the 1680s when an apple in England was said to fall on Newton's head. The gavitational theory is simple and explained the movements of stars and planets. But it is not suffisticated enough to explain fully the motion of moon. Newton was reported said that the lunar theory "made his head ache and kept him awake so often that he would think of it no more". In 1750s, it was the French talents like Euler and a few others who came to rescue. They invented the least sqaured method to fit the observational data perfectly with regression (see the equation below). Such regression equations looks very familiar in machine learning. Hack, math, and stat are all useful tools and methods, and often a complex solution integrate all three of them. For example, in image compression and coding, we have the math in information theory, wavelets and computational mononic analysis at its core. Then we also use statistics for the frequency of various elements in the code book. Finally the coding scheme contains numerous engineering hacks to make it work in real images/video. I believe that the solution to vision will rely on all three aspects.

  Hack_Math_Stat

---------------------------------------------------------------------------------------