Python KModes And KPrototypes Clustering

Aurum Anisa Salsabela
3 min readOct 6, 2021

--

Hello everyone, very happy to finally be able to start writing again something I promised earlier, namely clustering with KModes and Kprototypes. This may take longer than my previous post, for one or two reasons. For those who are already excited, so am I, so let’s start it.

In the previous story we have tried to do clustering with the KModes method using RStudio. Now we will try to do a comparison with KModes and KPrototypes clustering using Python (Jupyter Notebook).

Then what is the difference between KModes and KPrototypes? In short, Kmodes is a clustering method where the data that is clustered is categorical data. While KPrototypes, the type of data in the cluster is a mixture of numeric and categorical. You will find more details in the following case examples:

The first step is to open jupyter notebook via anaconda prompt, then open python on the juoyter notebook dashboard. Then a screen like this will appear:

Jupyter Notebook

Enter the pandas packages to be able to immort the data to be used. Open the KModes.Kmodes packages because the KModes library itself has 2 functions, namely KModes and KPrototypes. After that, enter the data that we want to cluster (here I named my data “Kmodes” and previously I saved it in the jupyter anaconda folder).

Inport Data

In the picture above it can be seen that after we look at our data table, there is a “data” column while this column is not needed in this clustering process, so we can delete it by using the syntax / command as follows:

The data

Next, we can determine the number of clusters we want with max iterations where the more iterations are carried out, the more accurate the results will be. From the syntax below, you can create a cluster, centeroid or cluster center and you can add clusters to the column beside the data table that we have.

KModes Clustering

Kprototypes

As we did before and this time we use KPrototypes and omit the Customer ID column because we don’t need that column.

Import Data

Float serves to ensure that the numeric data that we use are integers.

Float
Iteration
KPrototype Clustering

With the same steps as before, we can find clusters for each customer.

A short continuation from me. And I hope it will help anyone who reads a little bit. Henceforth, maybe that is how to use pivots on spreadsheets? Or it can also be Linearity Test with SPSS. See you learning fighter!

“Life’s most persistent and urgent question is, ‘What are you doing for others?” Martin Luther King, Jr.

--

--

Aurum Anisa Salsabela
Aurum Anisa Salsabela

Written by Aurum Anisa Salsabela

just want to share, because sharing is beautiful|Mikrokosmos

No responses yet