Evaluating Clustering in Machine Learning | by David Farrugia | Jul, 2023

| DATA | MACHINE

A guide to why, how, and what

David Farrugia
Towards Data Science
by Nareeta Martin on Unsplash

Clustering has always been one of those topics that garnered my attention. Especially when I was first getting into the whole sphere of machine learning, unsupervised clustering always carried an allure with it for me.

To put it simply, clustering is rather like the unsung knight in shining armour of machine learning. This form of unsupervised learning aims to similar data points into groups.

Visualise yourself in a social gathering where everyone is a stranger.

How would you decipher the crowd?

Perhaps, by grouping individuals based on shared traits, such as those laughing at a joke, the aficionados deep in conversation, or the group captivated by a literary discussion. That’s clustering in a nutshell!

You may wonder, “Why is it relevant?”.

Clustering boasts numerous applications.

  • Customer helping businesses categorise their according to buying patterns to tailor their marketing approaches.
  • Anomaly detectionidentify peculiar data points, like suspicious transactions in banking.
  • Optimised resource utilisation by configuring clusters.

However, there’s a caveat.

How do we make sure that our clustering effort is successful?

How can we efficiently evaluate a clustering solution?

This is where the requirement for robust methods emerges.

Without a robust evaluation technique, we could potentially end up with a model that appears promising on paper, but drastically underperforms in scenarios.

In this article, we’ll examine two renowned clustering evaluation methods: the Silhouette score and Density-Based Clustering Validation (DBCV). We’ll dive into their strengths, limitations, and ideal scenarios of use.

Source link