Data Science

A deep dive into partitioning around medoids

Series: Kmeans and Its Variants

In this final article in my mini-series on k-means and its variants, I will talk about the k-medoids algorithm, also commonly called partitioning around medoids (PAM). It has the beauty of being basically deterministic and find very good solutions reliably.

How to cluster noisy data sets

Series: Kmeans and Its Variants

Real-world data sets often come with many outliers that you might not be able to remove completely during the data cleanup phase. If you have run into this problem, I want to introduce you to the k-medians algorithm. By using the median instead of the mean, and using a more robust dissimilarity metric, it is much less sensitive to outliers.

A simple framework for performance metrics

The list of performance metrics is seemingly never-ending. Especially if you are new to data science, you can easily feel stranded in an ocean of choices. Learn how they connect to each other and how you can use it to choose the best metric for your problem and model.

The Dendrite Nanomap

Synapses are at the center of neuronal communication, yet they are incredibly difficult to study. During my PhD I created the a comprehensive, quantitative model of the postsynapse at the nanometer scale