I have a dataset like this:
id<-c(1:6)
value<-c(" ","1", "1 6","1 777"," ", " ")
df<-data.frame(id, value)
Now I would like to convert it as dummy variable for each value, and use 0 and 1 to represent "yes" and "no". In other word, instead of count the combination value, i would like to count each value for each observation. for example, the first obs is NA, so only NA is yes, the third obs chose combo value "1" and "6", so in row no.3, the cols"1" and col"6" are marked as "1" (which is yes). Ideally get the table looks like this (plz ignore the dot after the numbers):
id 1 6 777 NA
1 0 0 0 1
2 1 0 0 0
3 1 1 0 0
4 1 0 1 0
5 0 0 0 1
6 0 0 0 1
i tried use package "fastdummies", my code is like this:
df<-dummy_cols(df,
select_columns="value",
split="")
it does not work very well. Any solution for such case? Thanks a lot.
Also, when it spits out the dummy vars, the cols name like "value_", "value_6", is there any way to show the name as they were as value like "1", "6","777","NA". Thanks a lot~~!
CodePudding user response:
We may need to convert the space elements to NA
library(dplyr)
library(fastDummies)
library(tidyr)
library(stringr)
df %>%
na_if(" ") %>%
dummy_cols("value", split = " ", remove_selected_columns = TRUE) %>%
mutate(across(starts_with('value_'), replace_na, 0)) %>%
rename_with(~ str_remove(.x, "value_"), starts_with("value_"))
-output
id 1 6 777 NA
1 1 0 0 0 1
2 2 1 0 0 0
3 3 1 1 0 0
4 4 1 0 1 0
5 5 0 0 0 1
6 6 0 0 0 1