KModes Clustering with R

Aurum Anisa Salsabela
4 min readJun 10, 2021

The following is the R syntax that can be used for KModes clustering:

We need to install any packages such as klaR and scattterplot3d.

For the data we use to analyze, we generate data from binom distribution with 3 variables and 50 objects for each variable. This data will be random binomial with range 0–4. And the probability we use 0,50.

syntax R for KModes with binom
the data

By using colnames, you can change the name of your variables, for the example in this case we used to change variable X1,X2 and X3 into A,B, and C. We also can use scatterplot3d to see the data with scatter plot. There is many type of pch we can use :

  • pch = 0,square
  • pch = 1,circle (default)
  • pch = 2,triangle point up
  • pch = 3,plus
  • pch = 4,cross
  • pch = 5,diamond
  • pch = 6,triangle point down
  • pch = 7,square cross
  • pch = 8,star
  • pch = 9,diamond plus
  • pch = 10,circle plus
  • pch = 11,triangles up and down
  • pch = 12,square plus
  • pch = 13,circle cross
  • pch = 14,square and triangle down
  • pch = 15, filled square
  • pch = 16, filled circle
  • pch = 17, filled triangle point-up
  • pch = 18, filled diamond
  • pch = 19, solid circle
  • pch = 20,bullet (smaller circle)
  • pch = 21, filled circle blue
  • pch = 22, filled square blue
  • pch = 23, filled diamond blue
  • pch = 24, filled triangle point-up blue
  • pch = 25, filled triangle point down blue

(https://bookdown.org/moh_rosidi2610/Metode_Numerik/dataviz.html)

scatterplot 3D

This is the final result for the clustering in this case with 2 clusters. There’s 33 objects in cluster 1 and 17 objects in cluster 2. Cluster modes shows that the numbers that often appear in each cluster for each variable.

result of the clustering

Cluster modes is the mode value or each variable. For example in cluster 1 variable A the value that mostly appear is 1 and in the variable B cluster 1 the mostly appear is 2. Within cluster simple-matching distance by cluster is the average of distance for each data with the center of cluster.

The picture above can be solve with the table below. It shows the cluster for each objects . This table can be shown by using cbind. Cbind form a matrix by joining the matrix in rows to columns.

example of the result clustering table

Beside on this case we also can use any kinds of data and we can use our csv or excel file shown below :

syntax for clustering KModes with csv file

The data that we use in this case is on the colums number 2 until 6 of table data_kmod. The table below is table of data_kmod :

We will use the colums 2 until 6 (x1 until x5), so we use 5 variables and 10 object. We can use colnames to change the name for each colums that we use.

the data
after changing the coloumns name

This is the result :

Next post will be clustering KModes and KPrototypes with Python. Hope it will finish soon.

pray, will be a light and hope at every step -Uy.2021

--

--

Aurum Anisa Salsabela

just want to share, because sharing is beautiful|Mikrokosmos