Is there a distance function that can calculate both binary and numeric column distances at once?
tibble( Observation = c(1:6), V1 = c(3, 5, 4, 6, 9, 5),
V2 = c("a", "b", "a", "c", "b", "a"),
label = c("Red", "Red", "Blue", "Blue", "Red", "Blue")) %>%
select(2:4) %>%
fastDummies::dummy_cols() %>%
select(c(-V2, -label))
I typically use dist(df, method = 'binary'), but now I have a numeric column with the new dummy columns I created. The numeric column, V1, is equally important as the dummy variables.
CodePudding user response:
There is a distmix
function from kmed
where we specify the index of numeric/binary/categorical columns in idnum/idbin/idcat
respectively. It is mentioned in the ?distmix
idnum - A vector of column index of the numerical variables.
idbin - A vector of column index of the binary variables.
idcat - A vector of column index of the categorical variables.
library(kmed)
distmix(df1, idnum = 1, idbin = 2:ncol(df1))
In the example data, numeric column is the first column and all other columns are binary, thus we specify 2:ncol(df1)
as index for idbin
data
df1 <- tibble( Observation = c(1:6), V1 = c(3, 5, 4, 6, 9, 5),
V2 = c("a", "b", "a", "c", "b", "a"),
label = c("Red", "Red", "Blue", "Blue", "Red", "Blue")) %>%
select(2:4) %>%
fastDummies::dummy_cols() %>%
select(c(-V2, -label))