Understanding Fuzzy Clustering: The Soft K-Means You Should Know
Written on
Chapter 1: Introduction to Clustering Algorithms
In a previous discussion, I highlighted the advanced version of k-means clustering known as K-means++. Today, I will delve into a different clustering algorithm: Fuzzy Clustering in Machine Learning.
What Are Clustering Algorithms?
Before diving into Fuzzy Clustering, it’s essential to understand what clustering algorithms are. Clustering is a form of unsupervised learning widely used in statistical data analysis across various domains. In Data Science, we utilize clustering techniques to extract meaningful insights from datasets by observing how data points group together when a clustering algorithm is applied.
Fuzzy Clustering Explained
Fuzzy clustering is a methodology where data points can belong to multiple clusters simultaneously. From a computational perspective, creating fuzzy boundaries is significantly more manageable than assigning each point to a single cluster.
Types of Fuzzy Clustering Algorithms
Fuzzy clustering methods can be categorized into two main types: classical fuzzy clustering and shape-based fuzzy clustering.
Classical Fuzzy Clustering Algorithms
Fuzzy C-Means (FCM): This widely-used method functions similarly to the K-Means algorithm, allowing a data point to belong to all clusters with a membership function ranging from 0 (farthest from the cluster center) to 1 (closest to the center). Variants include:
- Possibilistic C-Means (PCM)
- Fuzzy Possibilistic C-Means (FPCM)
- Possibilistic Fuzzy C-Means (PFCM)
Gustafson-Kessel (GK) Algorithm: Unlike C-means, which assumes spherical clusters, GK allows for elliptical-shaped clusters.
Gath-Geva Algorithm: Also known as Gaussian Mixture Decomposition, this method is akin to FCM but supports clusters of varying shapes.
Shape-Based Fuzzy Clustering Algorithms
- Circular-Shaped Algorithms: These restrict data points to a circular formation. When incorporated into FCM, it is referred to as CS-FCM.
- Elliptical-Shaped Algorithms: These constrain points to elliptical shapes, used in the GK algorithm.
- Generic-Shaped Algorithms: Given that most real-world objects aren't strictly circular or elliptical, this algorithm accommodates clusters of any form.
Fuzzy Clustering Implementation in Python
To begin, we need to generate a dataset.
from __future__ import division, print_function
import numpy as np
import matplotlib.pyplot as plt
import skfuzzy as fuzz
colors = ['b', 'orange', 'g', 'r', 'c', 'm', 'y', 'k', 'Brown', 'ForestGreen']
# Define three cluster centers
centers = [[4, 2], [1, 7], [5, 6]]
# Define three cluster sigmas in x and y, respectively
sigmas = [[0.8, 0.3], [0.3, 0.5], [1.1, 0.7]]
# Generate test data
np.random.seed(42) # Set seed for reproducibility
xpts = np.zeros(1)
ypts = np.zeros(1)
labels = np.zeros(1)
for i, ((xmu, ymu), (xsigma, ysigma)) in enumerate(zip(centers, sigmas)):
xpts = np.hstack((xpts, np.random.standard_normal(200) * xsigma + xmu))
ypts = np.hstack((ypts, np.random.standard_normal(200) * ysigma + ymu))
labels = np.hstack((labels, np.ones(200) * i))
# Visualize the test data
fig0, ax0 = plt.subplots()
for label in range(3):
ax0.plot(xpts[labels == label], ypts[labels == label], '.', color=colors[label])
ax0.set_title('Test Data: 200 Points Across 3 Clusters.')
The above video titled "Day 70 - Fuzzy C-Means Clustering Algorithm" provides a comprehensive overview of the Fuzzy C-Means algorithm, discussing its core principles and applications in clustering.
Clustering Visualization
fig1, axes1 = plt.subplots(3, 3, figsize=(8, 8))
alldata = np.vstack((xpts, ypts))
fpcs = []
for ncenters, ax in enumerate(axes1.reshape(-1), 2):
cntr, u, u0, d, jm, p, fpc = fuzz.cluster.cmeans(
alldata, ncenters, 2, error=0.005, maxiter=1000, init=None)
# Store FPC values for later
fpcs.append(fpc)
# Plot assigned clusters for each data point in the training set
cluster_membership = np.argmax(u, axis=0)
for j in range(ncenters):
ax.plot(xpts[cluster_membership == j],
ypts[cluster_membership == j], '.', color=colors[j])
# Mark the center of each fuzzy cluster
for pt in cntr:
ax.plot(pt[0], pt[1], 'rs')
ax.set_title('Centers = {0}; FPC = {1:.2f}'.format(ncenters, fpc))
ax.axis('off')
fig1.tight_layout()
Building the Model
# Regenerate fuzzy model with 3 cluster centers
cntr, u_orig, _, _, _, _, _ = fuzz.cluster.cmeans(
alldata, 3, 2, error=0.005, maxiter=1000)
# Show 3-cluster model
fig2, ax2 = plt.subplots()
ax2.set_title('Trained Model')
for j in range(3):
ax2.plot(alldata[0, u_orig.argmax(axis=0) == j],
alldata[1, u_orig.argmax(axis=0) == j], 'o',
label='series ' + str(j))
ax2.legend()
The subsequent video titled "Day 71 - Fuzzy C-Means Clustering Implementation" illustrates how to implement Fuzzy C-Means clustering in Python, showcasing practical coding techniques and results.
Predicting Cluster Membership
# Generate uniformly sampled data across the range [0, 10] in x and y
newdata = np.random.uniform(0, 1, (1100, 2)) * 10
# Predict new cluster membership using cmeans_predict
u, u0, d, jm, p, fpc = fuzz.cluster.cmeans_predict(
newdata.T, cntr, 2, error=0.005, maxiter=1000)
# Visualize the classified uniform data
cluster_membership = np.argmax(u, axis=0) # Hardening for visualization
fig3, ax3 = plt.subplots()
ax3.set_title('Random Points Classified According to Known Centers')
for j in range(3):
ax3.plot(newdata[cluster_membership == j, 0],
newdata[cluster_membership == j, 1], 'o',
label='series ' + str(j))
ax3.legend()
plt.show()