惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Help Net Security
Help Net Security
S
SegmentFault 最新的问题
T
The Blog of Author Tim Ferriss
J
Java Code Geeks
博客园 - 三生石上(FineUI控件)
月光博客
月光博客
C
Check Point Blog
M
MIT News - Artificial intelligence
GbyAI
GbyAI
H
Hackread – Cybersecurity News, Data Breaches, AI and More
U
Unit 42
D
Docker
G
Google Developers Blog
云风的 BLOG
云风的 BLOG
H
Help Net Security
D
DataBreaches.Net
Microsoft Azure Blog
Microsoft Azure Blog
B
Blog RSS Feed
Engineering at Meta
Engineering at Meta
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
B
Blog
Cloudbric
Cloudbric
Blog — PlanetScale
Blog — PlanetScale
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
T
Troy Hunt's Blog
N
News | PayPal Newsroom
V2EX - 技术
V2EX - 技术
H
Heimdal Security Blog
S
Security @ Cisco Blogs
V
Visual Studio Blog
The Last Watchdog
The Last Watchdog
博客园 - Franky
大猫的无限游戏
大猫的无限游戏
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
Webroot Blog
Webroot Blog
Security Archives - TechRepublic
Security Archives - TechRepublic
C
Cyber Attacks, Cyber Crime and Cyber Security
Last Week in AI
Last Week in AI
爱范儿
爱范儿
博客园 - 聂微东
S
Securelist
小众软件
小众软件
酷 壳 – CoolShell
酷 壳 – CoolShell
Cisco Talos Blog
Cisco Talos Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
C
CXSECURITY Database RSS Feed - CXSecurity.com
V
Vulnerabilities – Threatpost
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
O
OpenAI News
Apple Machine Learning Research
Apple Machine Learning Research

程序萌部落

Problems in explanations of projections Cut off the tail [redundant parts] Describe one thing in 10 minutes 为什么开启这个里世界? Important rules in English writing Try to explain the projections. Some writing pitfalls Summary of 2020 - Part B 2020年终总结(A)随便写写 我眼中的《姜子牙》——三界官场现形记 东北亚离战争到底有多远?
Understand complex tables in one minute
程序萌部落 · 2020-12-03 · via 程序萌部落

Here you have a normal data table in your hand. It could include thousands of rows and columns, and usually, the items are all numerical numbers, which are hard to understand. Now, your task is to get information from it as much as possible.

What can you do with the data? How could you find useful information from it?

Exposition

The general way is to compute some statistical features. But if you want to get a good comprehension of it, you need to find more meaningful features, like clustering information of rows and relationships between different columns.

To do this, you can use some advanced methods like principal components analysis which is known as PCA. But the numerical result of these methods is still hard to understand. So, here we introduce visual signals while using these methods.

For example, with PCA, you can get a squeezed result, which means the number of rows won’t change but columns could be compressed into 2 or 3. Since columns indicate dimensional information, you can say the dataset is transformed from high-dimension to low-dimension, which you can easily visualize as a scatterplot in the coordinate system.

Now, you can get the scatterplot in which every row represents a point, and they could gather into different clusters. We naturally think these clusters are the clustering features of this dataset. And if you assign different colors into different clusters, then you can understand it immediately.

Propelling Moment

But the question is that since PCA is just one of so many algorithms, how could you make sure the PCA could give you a proper result. Can you trust it? The short answer is No. We need a way to test it.

<< Read more articles in https://www.cxmoe.com >>

Divergent Action Phase

Here we compute two kinds of neighbor errors by testing every point in the scatterplot.

What are the neighbors? If points are located in a circle area around the one we selected, then we call them their neighbors.
We use the classic Euclidean distance to find out all neighbors of each point in the low-dimension scatterplot and the original high-dimension data.

Next, we can compare these two kinds of neighbors. For the point of low-dim scatterplots, if its neighbors don’t appear in high-dim data, then we call them false-neighbors. And if the points of low-dim scatterplots miss some neighbors of high-dim data, then we call these missing-neighbors.

Crucial Event

Now, we get important errors! that could tell you which part of the scatterplot we can trust and which part is totally wrong.

Convergent Action Phase

The next question is how to assign these error messages to our scatterplot?

There are several ways to do it. We can use luminance to represent errors with the original scatterplot or we just generate a new scatterplot with colors to show errors.
With the above methods, we could have several visual results of the original dataset.

Solution

So now, you have 3 views. Using the scatterplot of PCA, you 1can easily understand the clustering information. And with these error views, we can easily correct the former understanding. Actually, this kind of solution could be used as a general process, we can also compute other types of views to refine our understanding.

Coda

So, you can see these visual methods able you to have a good comprehension of the table in a short time. That’s why we call it: understand complex tables in one minute!