Beyond tibbles - Creating your own tibble subclass and making it work with the tidyverse

The tidyverse is the most popular analysis framework for R. But your data might need some special treatment due to its particular nature. In this article, I show you how to extend the tibble class and core tidyverse functions to make them work for your data.

A deep dive into partitioning around medoids

Series: Kmeans and Its Variants

In this final article in my mini-series on k-means and its variants, I will talk about the k-medoids algorithm, also commonly called partitioning around medoids (PAM). It has the beauty of being basically deterministic and find very good solutions reliably.

How to cluster noisy data sets

Series: Kmeans and Its Variants

Real-world data sets often come with many outliers that you might not be able to remove completely during the data cleanup phase. If you have run into this problem, I want to introduce you to the k-medians algorithm. By using the median instead of the mean, and using a more robust dissimilarity metric, it is much less sensitive to outliers.

The k-means++ algorithm to kick start your initialization

Series: Kmeans and Its Variants

k-means is a very simple and ubiquitous clustering algorithm. But quite often it does not work on your problem, for example because the initialization is bad. Fortunately, there is an improved initialization method, k-means++, which can help to alleviate this problem.