惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

AI
AI
TaoSecurity Blog
TaoSecurity Blog
H
Heimdal Security Blog
Help Net Security
Help Net Security
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Microsoft Azure Blog
Microsoft Azure Blog
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Google DeepMind News
Google DeepMind News
爱范儿
爱范儿
The Cloudflare Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
人人都是产品经理
人人都是产品经理
大猫的无限游戏
大猫的无限游戏
N
News | PayPal Newsroom
V2EX - 技术
V2EX - 技术
博客园 - 【当耐特】
D
Darknet – Hacking Tools, Hacker News & Cyber Security
S
Secure Thoughts
C
CERT Recently Published Vulnerability Notes
罗磊的独立博客
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
P
Privacy & Cybersecurity Law Blog
有赞技术团队
有赞技术团队
S
Schneier on Security
S
SegmentFault 最新的问题
Google Online Security Blog
Google Online Security Blog
H
Hacker News: Front Page
The Last Watchdog
The Last Watchdog
Schneier on Security
Schneier on Security
PCI Perspectives
PCI Perspectives
IT之家
IT之家
Project Zero
Project Zero
博客园 - 司徒正美
P
Privacy International News Feed
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Jina AI
Jina AI
Security Latest
Security Latest
Hacker News - Newest:
Hacker News - Newest: "LLM"
腾讯CDC
C
CXSECURITY Database RSS Feed - CXSecurity.com
阮一峰的网络日志
阮一峰的网络日志
C
Check Point Blog
aimingoo的专栏
aimingoo的专栏
V
Vulnerabilities – Threatpost
W
WeLiveSecurity
NISL@THU
NISL@THU
Webroot Blog
Webroot Blog
N
Netflix TechBlog - Medium
L
Lohrmann on Cybersecurity

程序萌部落

Problems in explanations of projections Cut off the tail [redundant parts] Describe one thing in 10 minutes 为什么开启这个里世界? Important rules in English writing Try to explain the projections. Some writing pitfalls Summary of 2020 - Part B 2020年终总结(A)随便写写 我眼中的《姜子牙》——三界官场现形记 东北亚离战争到底有多远?
Understand complex tables in one minute
程序萌部落 · 2020-12-03 · via 程序萌部落

Here you have a normal data table in your hand. It could include thousands of rows and columns, and usually, the items are all numerical numbers, which are hard to understand. Now, your task is to get information from it as much as possible.

What can you do with the data? How could you find useful information from it?

Exposition

The general way is to compute some statistical features. But if you want to get a good comprehension of it, you need to find more meaningful features, like clustering information of rows and relationships between different columns.

To do this, you can use some advanced methods like principal components analysis which is known as PCA. But the numerical result of these methods is still hard to understand. So, here we introduce visual signals while using these methods.

For example, with PCA, you can get a squeezed result, which means the number of rows won’t change but columns could be compressed into 2 or 3. Since columns indicate dimensional information, you can say the dataset is transformed from high-dimension to low-dimension, which you can easily visualize as a scatterplot in the coordinate system.

Now, you can get the scatterplot in which every row represents a point, and they could gather into different clusters. We naturally think these clusters are the clustering features of this dataset. And if you assign different colors into different clusters, then you can understand it immediately.

Propelling Moment

But the question is that since PCA is just one of so many algorithms, how could you make sure the PCA could give you a proper result. Can you trust it? The short answer is No. We need a way to test it.

<< Read more articles in https://www.cxmoe.com >>

Divergent Action Phase

Here we compute two kinds of neighbor errors by testing every point in the scatterplot.

What are the neighbors? If points are located in a circle area around the one we selected, then we call them their neighbors.
We use the classic Euclidean distance to find out all neighbors of each point in the low-dimension scatterplot and the original high-dimension data.

Next, we can compare these two kinds of neighbors. For the point of low-dim scatterplots, if its neighbors don’t appear in high-dim data, then we call them false-neighbors. And if the points of low-dim scatterplots miss some neighbors of high-dim data, then we call these missing-neighbors.

Crucial Event

Now, we get important errors! that could tell you which part of the scatterplot we can trust and which part is totally wrong.

Convergent Action Phase

The next question is how to assign these error messages to our scatterplot?

There are several ways to do it. We can use luminance to represent errors with the original scatterplot or we just generate a new scatterplot with colors to show errors.
With the above methods, we could have several visual results of the original dataset.

Solution

So now, you have 3 views. Using the scatterplot of PCA, you 1can easily understand the clustering information. And with these error views, we can easily correct the former understanding. Actually, this kind of solution could be used as a general process, we can also compute other types of views to refine our understanding.

Coda

So, you can see these visual methods able you to have a good comprehension of the table in a short time. That’s why we call it: understand complex tables in one minute!