KModes Clustering with R

4 min readJun 10, 2021

The following is the R syntax that can be used for KModes clustering:

We need to install any packages such as klaR and scattterplot3d.

For the data we use to analyze, we generate data from binom distribution with 3 variables and 50 objects for each variable. This data will be random binomial with range 0–4. And the probability we use 0,50.

By using colnames, you can change the name of your variables, for the example in this case we used to change variable X1,X2 and X3 into A,B, and C. We also can use scatterplot3d to see the data with scatter plot. There is many type of pch we can use :

pch = 0,square
pch = 1,circle (default)
pch = 2,triangle point up
pch = 3,plus
pch = 4,cross
pch = 5,diamond
pch = 6,triangle point down
pch = 7,square cross
pch = 8,star
pch = 9,diamond plus
pch = 10,circle plus
pch = 11,triangles up and down
pch = 12,square plus
pch = 13,circle cross
pch = 14,square and triangle down
pch = 15, filled square
pch = 16, filled circle
pch = 17, filled triangle point-up
pch = 18, filled diamond
pch = 19, solid circle
pch = 20,bullet (smaller circle)
pch = 21, filled circle blue
pch = 22, filled square blue
pch = 23, filled diamond blue
pch = 24, filled triangle point-up blue
pch = 25, filled triangle point down blue

(https://bookdown.org/moh_rosidi2610/Metode_Numerik/dataviz.html)

This is the final result for the clustering in this case with 2 clusters. There’s 33 objects in cluster 1 and 17 objects in cluster 2. Cluster modes shows that the numbers that often appear in each cluster for each variable.

Cluster modes is the mode value or each variable. For example in cluster 1 variable A the value that mostly appear is 1 and in the variable B cluster 1 the mostly appear is 2. Within cluster simple-matching distance by cluster is the average of distance for each data with the center of cluster.

The picture above can be solve with the table below. It shows the cluster for each objects . This table can be shown by using cbind. Cbind form a matrix by joining the matrix in rows to columns.

Beside on this case we also can use any kinds of data and we can use our csv or excel file shown below :

syntax for clustering KModes with csv file

The data that we use in this case is on the colums number 2 until 6 of table data_kmod. The table below is table of data_kmod :

We will use the colums 2 until 6 (x1 until x5), so we use 5 variables and 10 object. We can use colnames to change the name for each colums that we use.