The Magic of Clusters: A Journey Through Image Segmentation with k-means

Explore the captivating world of image segmentation, where we delve deep into the k-means clustering algorithm, offering a blend of technical insights, practical tutorials, and a poetic perspective to demystify this powerful image-processing technique.

7 min readDec 13, 2023

Welcome to a captivating exploration of digital image segmentation! In this comprehensive guide, we’ll delve deep into the intricacies of the k-means clustering algorithm, a cornerstone technique in the field of image processing and machine learning and learn how to create striking segmented image GIFs on your own!

Decoding the Visual Symphony: The Art of Image Segmentation

Image segmentation is a pivotal process in the field of computer vision and image processing, where a digital image is partitioned into multiple segments or sets of pixels. This technique is akin to looking through a kaleidoscope, where a single image is broken down into distinct components, each representing a specific part or feature of the picture. The primary objective of image segmentation is to simplify or change the representation of an image into something more meaningful and more accessible to analyze. It’s like dissecting a complex puzzle into smaller, manageable pieces, each representing a particular object, texture, or boundary within the image.

Understanding k-means Clustering in Depth

At the core of our adventure is the k-means clustering algorithm. It’s a method used to partition a dataset (in our case, pixels of an image) into ‘k’ distinct, non-overlapping subsets or clusters. The goal is to minimize the variance within each group while maximizing the variance between different sets.

The Intricacies of k-means

Let’s delve into the intricacies of k-means, unveiling its mathematical elegance and practical utility:

1. The Initial Step — Choosing Centroids:
The algorithm begins by randomly selecting ‘k’ initial centroids. These centroids are pivotal as they represent the core around which clusters are formed. The choice of ‘k’ is crucial and often determined by heuristic methods or domain knowledge (or aesthetics as in our case :))

2. Assignment — The Gathering of Pixels:
Each pixel in the image is then assigned to the nearest centroid. This is typically based on the Euclidean distance, a measure of the “straight line” distance between two points in a space. In our image context, this translates to grouping pixels based on colour similarity.

3. Update — The Dance of the Centroids:
Once all pixels are assigned, the position of each centroid is recalibrated. This is done by taking the mean of all the pixels assigned to that cluster. The centroid moves to the heart of its cluster.

4. Iterative Optimization — The Path to Convergence:
The assignment and update steps are repeated iteratively. With each iteration, the centroids shift, redefining the clusters. This process continues until the centroids stabilize and when there’s minimal or no change in their positions, indicating the algorithm has converged.

5. Convergence — The Final Act:
Once the centroids stop moving significantly, indicating the minimization of intra-cluster variance, the algorithm concludes. The resulting clusters represent groups of pixels with similar colours or intensities.

6. The Outcome — A Segmented Image:
Each cluster now represents a segment of the image. The pixels in each cluster can be recoloured or manipulated to highlight the segmentation, offering a vivid visual representation of the algorithm’s effectiveness.

If you prefer a poem instead of a block of text, let’s take a moment to explore the beauty of k-means clustering in a more artistic form. Here’s a poetic interlude that weaves the essence of this algorithm into a lyrical narrative, offering a creative and whimsical glimpse into its mathematical elegance.

A Poetic Interlude on k-means

In a realm of pixels and hues,
k-means dances, clusters it brews.
Randomly chosen, centroids stand,
Gathering pixels, a colorful band.
Centroids shift, a harmonious glide,
Pixels reassigned, with each stride.
Iterations bring clusters in sight,
Till variance wanes, and clusters alight.

Repository Structure and Setup

Before diving into the practical implementation, let’s explore the structure of the GitHub repository we will be using today:


pixel-segmentation-using-k-means-main/
| — .gitattributes
| — README.md
| — images/
| — kmeans.py
| — main.py
| — requirements.txt

Installation and Setup:
Clone the repository and install the dependencies:


git clone https://github.com/NisargBhavsar25/pixel-segmentation-using-kmeans.git
cd pixel-segmentation-using-kmeans
pip install -r requirements.txt

Diving into the Code

The two key scripts are main.py and kmeans.py.

main.py: Orchestrates the segmentation process, applying k-means clustering for different ‘k’ values and generating a GIF.

import cv2
import numpy as np
from PIL import Image
from kmeans import KMeans
import argparse

# Set up argument parser
parser = argparse.ArgumentParser(description="Image Segmentation using k-means clustering.")
parser.add_argument('--path', type=str, default='images/input-image.jpg', help='Path to the input image')
args = parser.parse_args()

# Read the image from the provided path
image = cv2.imread(args.path)

# Function to resize an image while maintaining aspect ratio
def resize_image(image, max_size=500):
    height, width = image.shape[:2]

    # Calculate the ratio to resize to
    if height > width:
        ratio = max_size / float(height)
    else:
        ratio = max_size / float(width)

    new_dimensions = (int(width * ratio), int(height * ratio))

    # Resize and return the image
    return cv2.resize(image, new_dimensions, interpolation=cv2.INTER_AREA)

# Resize the image to the optimum size
resized_image = resize_image(image)

raw = np.float32(resized_image.reshape((-1, 3)))

ks = [1, 2, 3, 4, 5, 6, 8, 10, 14, 16, 20, 25, 40, 50]

for k in ks:
    model = KMeans(k=k)
    model.fit(raw)
    segmented_raw = np.zeros(raw.shape)

    for i, pixel in enumerate(raw):
        segmented_raw[i] = np.int64(model.predict(pixel))

    segmented = segmented_raw.reshape(resized_image.shape)
    text = f"k={k}"
    text_size, _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_COMPLEX, 2, 2)
    text_x = (segmented.shape[1] - text_size[0]) // 2
    segmented = cv2.putText(segmented, text, (text_x, 50), cv2.FONT_HERSHEY_COMPLEX, 2, (255, 255, 255), 2)
    cv2.imwrite(f"images/segmented-images/k_{k}.jpg", segmented)
    print(f"k_{k}.jpg outputed")

frames = [Image.open(f"images/segmented-images/k_{i}.jpg")
          for i in ks]
frame_one = frames[0]
frame_one.save("images/output.gif", format="GIF", append_images=frames,
               save_all=True, duration=750, loop=0)

print('''
-------------------------
        Done!!
-------------------------
''')

kmeans.py: Implements the k-means algorithm used in the main.py script.

import numpy as np
import random

class KMeans:
    def __init__(self, k=2, tol=0.001, max_iter=500):
        self.k = k
        self.tol = tol
        self.max_iter = max_iter

    def fit(self, data):
        if len(data) < self.k:
            raise ValueError("Number of data points must be greater than k")

        # Initialize centroids with distinct data points
        self.centroids = {i: data[random.randint(0, len(data) - 1)] for i in range(self.k)}

        for _ in range(self.max_iter):
            self.classifications = {i: [] for i in range(self.k)}

            # Convert centroids to a NumPy array for vectorized operations
            centroids_array = np.array(list(self.centroids.values()))

            # Vectorized distance calculation
            distances = np.linalg.norm(data[:, np.newaxis] - centroids_array, axis=2)
            classifications = np.argmin(distances, axis=1)

            for index, classification in enumerate(classifications):
                self.classifications[classification].append(data[index])

            prev_centroids = dict(self.centroids)

            # Recalculate centroids
            for classification in self.classifications:
                if len(self.classifications[classification]) > 0:
                    self.centroids[classification] = np.mean(self.classifications[classification], axis=0)

            optimized = True

            for c in self.centroids:
                original_centroid = prev_centroids[c]
                current_centroid = self.centroids[c]

                # Optimization check with a condition to avoid division by zero
                if np.linalg.norm(current_centroid - original_centroid) > self.tol:
                    optimized = False

            if optimized:
                break

    def predict(self, data):
        centroids_array = np.array(list(self.centroids.values()))
        distances = np.linalg.norm(data - centroids_array, axis=1)
        classification = np.argmin(distances)
        return self.centroids[classification]

The Segmentation Tutorial

To run the main script and leverage the capabilities of the k-means clustering for image segmentation, we can use the following command line interface (CLI) command in Python:

python main.py path /path/to/your/input-image.jpg

This command will execute the main.py script, and you can specify the path to your input image using the — path flag.

After running this script, the output will be a captivating GIF showcasing the segmented image at various levels, which will be saved in the images folder with the filename output.gif. Additionally, the individual segmented images at each clustering level will be saved in a folder named segmented-images. This feature lets you observe and analyze each segmentation step in detail, providing a comprehensive view of the process.

Applications and Real-World Implications

The application of image segmentation, mainly through the k-means clustering technique, extends far beyond the boundaries of academic interest and delves into many real-world scenarios.

In the realm of autonomous vehicles, image segmentation is crucial for environmental perception, allowing cars to distinguish between roads, pedestrians, and obstacles, thereby ensuring safe navigation.

In the realm of digital imagery, the application of image segmentation, mainly through methods like k-means clustering, extends significantly into the field of image compression. By segmenting an image into clusters of similar pixels, k-means clustering simplifies the image’s colour palette, reducing the overall data needed to represent the image.

Conclusion

As we draw this comprehensive exploration to a close, it’s evident that the journey through k-means clustering and image segmentation is much more than a technical exercise. It’s a fascinating intersection of art and science, where mathematical precision meets creative vision. Through this tutorial, you’ve not only gained insights into the complex yet elegant k-means algorithm but also experienced firsthand how it can transform simple images into segmented masterpieces, revealing patterns and details that were once hidden.

Additional Resources

To learn more: