惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Cisco Talos Blog
Cisco Talos Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Vercel News
Vercel News
B
Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
S
Schneier on Security
Blog — PlanetScale
Blog — PlanetScale
Google DeepMind News
Google DeepMind News
博客园 - 司徒正美
NISL@THU
NISL@THU
T
Threat Research - Cisco Blogs
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Latest news
Latest news
H
Help Net Security
雷峰网
雷峰网
Spread Privacy
Spread Privacy
Cyberwarzone
Cyberwarzone
Project Zero
Project Zero
Security Latest
Security Latest
Know Your Adversary
Know Your Adversary
人人都是产品经理
人人都是产品经理
P
Privacy & Cybersecurity Law Blog
M
MIT News - Artificial intelligence
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
P
Proofpoint News Feed
U
Unit 42
大猫的无限游戏
大猫的无限游戏
A
Arctic Wolf
博客园 - 三生石上(FineUI控件)
Stack Overflow Blog
Stack Overflow Blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
C
Cybersecurity and Infrastructure Security Agency CISA
量子位
C
Cyber Attacks, Cyber Crime and Cyber Security
S
Securelist
S
Security @ Cisco Blogs
T
Threatpost
P
Palo Alto Networks Blog
C
Check Point Blog
V
Vulnerabilities – Threatpost
T
Tailwind CSS Blog
B
Blog RSS Feed
Recorded Future
Recorded Future
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
W
WeLiveSecurity
P
Proofpoint News Feed
P
Privacy International News Feed
AWS News Blog
AWS News Blog
博客园 - 叶小钗
WordPress大学
WordPress大学

博客园 - homegis

关于开源GIS和商业GIS的讨论 B树索引学习 cordova 开发 ios app 简要流程 cordova 开发 android app 简要流程 3D开源推荐:3DWebExplorer 3D开源推荐:全球卫星地图 Esri-Satellite-Map 空间网络分析开源环境的安装方法 Computing Aggregate Queries in Raster Image Databases Using Pre-Aggregated Data Cheetah:A High Performance, Custom Data Warehouse on Top of MapReduce 【转】 Ubuntu内核编译 【转】什么是数据驱动编程 DISPLAY connection problem when using ENVI/IDL in X Terminal LINUX 上 ENVI 4.7 安装步骤,IDL 调用方式 图论——网络最大流和最小截 Gfarm 安装(已测试) [转]多表连接的三种方式详解 HASH JOIN MERGE JOIN NESTED LOOP VS2005 调用 IDL7.1 方法 [转] 如何下载Google Earth中的卫星影像 【转】Envi调用MODIS Reprojection Tool(MRT)对MODIS产品进行批处理拼接、重投影、裁切 - homegis
Fuzzy C-Means Clustering【转】
homegis · 2012-09-05 · via 博客园 - homegis

linkto: http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/index.html

Fuzzy C-Means Clustering

The Algorithm
Fuzzy c-means (FCM) is a method of clustering which allows one piece of data to belong to two or more clusters. This method (developed by Dunn in 1973 and improved by Bezdek in 1981) is frequently used in pattern recognition. It is based on minimization of the following objective function:

    ,    

where m is any real number greater than 1, uij is the degree of membership of xi in the cluster j, xi is the ith of d-dimensional measured data, cj is the d-dimension center of the cluster, and ||*|| is any norm expressing the similarity between any measured data and the center.
Fuzzy partitioning is carried out through an iterative optimization of the objective function shown above, with the update of membership uij and the cluster centers cj by:

    ,    

This iteration will stop when , where is a termination criterion between 0 and 1, whereas k are the iteration steps. This procedure converges to a local minimum or a saddle point of Jm.
The algorithm is composed of the following steps:

  1. Initialize U=[uij] matrix, U(0)
  2. At k-step: calculate the centers vectors C(k)=[cj] with U(k)
  3. Update U(k) , U(k+1)
  4. If || U(k+1) - U(k)||< then STOP; otherwise return to step 2.

Remarks
As already told, data are bound to each cluster by means of a Membership Function, which represents the fuzzy behaviour of this algorithm. To do that, we simply have to build an appropriate matrix named U whose factors are numbers between 0 and 1, and represent the degree of membership between data and centers of clusters.
For a better understanding, we may consider this simple mono-dimensional example. Given a certain data set, suppose to represent it as distributed on an axis. The figure below shows this:

Looking at the picture, we may identify two clusters in proximity of the two data concentrations. We will refer to them using ‘A’ and ‘B’. In the first approach shown in this tutorial - the k-means algorithm - we associated each datum to a specific centroid; therefore, this membership function looked like this:

In the FCM approach, instead, the same given datum does not belong exclusively to a well defined cluster, but it can be placed in a middle way. In this case, the membership function follows a smoother line to indicate that every datum may belong to several clusters with different values of the membership coefficient.

In the figure above, the datum shown as a red marked spot belongs more to the B cluster rather than the A cluster. The value 0.2 of ‘m’ indicates the degree of membership to A for such datum. Now, instead of using a graphical representation, we introduce a matrix U whose factors are the ones taken from the membership functions:

        

(a)                                  (b)

The number of rows and columns depends on how many data and clusters we are considering. More exactly we have C = 2 columns (C = 2 clusters) and N rows, where C is the total number of clusters and N is the total number of data. The generic element is so indicated: uij.
In the examples above we have considered the k-means (a) and FCM (b) cases. We can notice that in the first case (a) the coefficients are always unitary. It is so to indicate the fact that each datum can belong only to one cluster. Other properties are shown below:

An Example
Here, we consider the simple case of a mono-dimensional application of the FCM. Twenty data and three clusters are used to initialize the algorithm and to compute the U matrix. Figures below (taken from our interactive demo) show the membership value for each datum and for each cluster. The color of the data is that of the nearest cluster according to the membership function.

In the simulation shown in the figure above we have used a fuzzyness coefficient m = 2 and we have also imposed to terminate the algorithm when . The picture shows the initial condition where the fuzzy distribution depends on the particular position of the clusters. No step is performed yet so that clusters are not identified very well. Now we can run the algorithm until the stop condition is verified. The figure below shows the final condition reached at the 8th step with m=2 and =0.3:

Is it possible to do better? Certainly, we could use an higher accuracy but we would have also to pay for a bigger computational effort. In the next figure we can see a better result having used the same initial conditions and =0.01, but we needed 37 steps!

It is also important to notice that different initializations cause different evolutions of the algorithm. In fact it could converge to the same result but probably with a different number of iteration steps.

Bibliography

  • J. C. Dunn (1973): "A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters", Journal of Cybernetics 3: 32-57

  • J. C. Bezdek (1981): "Pattern Recognition with Fuzzy Objective Function Algoritms", Plenum Press, New York