This function runs many existing procedures for determining how many clusters are present in data. It returns the number of clusters based on the maximum consensus. In case of ties, it will select the solution with fewer clusters.

n_clusters(
  x,
  standardize = TRUE,
  include_factors = FALSE,
  package = c("NbClust", "mclust", "cluster", "M3C"),
  fast = TRUE,
  ...
)

n_clusters_elbow(
  x,
  standardize = TRUE,
  include_factors = FALSE,
  clustering_function = stats::kmeans,
  n_max = 15,
  ...
)

n_clusters_gap(
  x,
  standardize = TRUE,
  include_factors = FALSE,
  clustering_function = stats::kmeans,
  n_max = 15,
  ...
)

n_clusters_silhouette(
  x,
  standardize = TRUE,
  include_factors = FALSE,
  clustering_function = stats::kmeans,
  n_max = 15,
  ...
)

Arguments

x

A data frame.

standardize

Standardize the dataframe before clustering (default).

include_factors

Logical, if TRUE, factors are converted to numerical values in order to be included in the data for determining the number of clusters. By default, factors are removed, because most methods that determine the number of clusters need numeric input only.

package

Package from which methods are to be called to determine the number of clusters. Can be "all" or a vector containing "NbClust", "mclust", "cluster" and "M3C".

fast

If FALSE, will compute 4 more indices (sets index = "allong" in NbClust). This has been deactivated by default as it is computationally heavy.

...

Arguments passed to or from other methods.

clustering_function

The clustering functions to use. Can be kmeans, codecluster::pam, codecluster::clara, codecluster::fanny, and more. See fviz_nbclust.

n_max

Maximal number of clusters to test.

Note

There is also a plot()-method implemented in the see-package.

Examples

library(parameters)
# \donttest{
if (require("mclust", quietly = TRUE) && require("NbClust", quietly = TRUE) &&
  require("cluster", quietly = TRUE) && require("see")) {
  n <- n_clusters(iris[, 1:4], package = c("NbClust", "mclust", "cluster"))
  n
  as.data.frame(n)
  plot(n)

  # The following runs all the method but it significantly slower
  # n_clusters(iris[, 1:4], standardize = FALSE, package = "all", fast = FALSE)
}
#> Loading required package: see
#> Error: package or namespace load failed for ‘see’:
#>  object ‘check_heterogeneity’ is not exported by 'namespace:parameters'
# }
# Elbow method --------------------
if (require("openxlsx") && require("see")) {
  x <- n_clusters_elbow(iris[1:4])
  x
  as.data.frame(x)
  plot(x)
}
#> Loading required package: openxlsx
#> Loading required package: see
#> Error: package or namespace load failed for ‘see’:
#>  object ‘check_heterogeneity’ is not exported by 'namespace:parameters'
# Gap method --------------------
x <- n_clusters_gap(iris[1:4])
x
#> The Gap method, that compares the total intracluster variation of k clusters with their expected values under null reference distribution of the data, suggests that the optimal number of clusters is 2.
as.data.frame(x)
#>    n_Clusters       Gap         SE
#> 1           1 0.2214401 0.02507538
#> 2           2 0.4740829 0.03319823
#> 3           3 0.4942913 0.02579692
#> 4           4 0.4409764 0.02755421
#> 5           5 0.4017140 0.02235386
#> 6           6 0.4498149 0.02273822
#> 7           7 0.4483496 0.02347120
#> 8           8 0.4623633 0.02329825
#> 9           9 0.4767078 0.02639123
#> 10         10 0.4714515 0.02166452
#> 11         11 0.4203812 0.02390283
#> 12         12 0.3406179 0.02219453
#> 13         13 0.5486026 0.02587561
#> 14         14 0.4818785 0.02482117
#> 15         15 0.5270261 0.02403941
plot(x)
#> Error: Package 'see' is required for this function to work.
#>   Please install it by running install.packages('see').
# Silhouette method --------------------
x <- n_clusters_silhouette(iris[1:4])
x
#> The Silhouette method, based on the average quality of clustering, suggests that the optimal number of clusters is 2.
as.data.frame(x)
#>    n_Clusters Silhouette
#> 1           1  0.0000000
#> 2           2  0.5817500
#> 3           3  0.4599482
#> 4           4  0.4188923
#> 5           5  0.3536197
#> 6           6  0.3445645
#> 7           7  0.3152814
#> 8           8  0.3462564
#> 9           9  0.3180890
#> 10         10  0.3187029
#> 11         11  0.2794598
#> 12         12  0.3985803
#> 13         13  0.3590287
#> 14         14  0.2922473
#> 15         15  0.3347490
plot(x)
#> Error: Package 'see' is required for this function to work.
#>   Please install it by running install.packages('see').