I have a data frame with a column ("name") that contains names of fruits:
name
Apple
Apple
Mango
Banana
Banana
Orange
Mango
Orange
.... And so on. I have 9 fruits in my data
I want to create new variables following the naming rule "name_'data'". So, I want to add 9 more variables such that:
name name_Apple name_Mango name_Banana name_Orange
Apple 1 0 0 0
Apple 1 0 0 0
Mango 0 1 0 0
Banana 0 0 1 0
Banana 0 0 1 0
Orange 0 0 0 1
Mango 0 1 0 0
Orange 0 0 0 1
I want to use a for loop to do this since data will be added to the existing frame. I have tried this:
name_list <- c("Apple", "Mango", "Banana", "Orange)
for (i in name_list) {
df_main$name_[[i]] <- ifelse(df_main$name == [[i]], 1, 0)
}
I get the error "Error: unexpected '[['". I think I'm referencing character data wrong in the loop, but can't figure out how to do it correctly. Will mutate() work better here?
CodePudding user response:
We can use dummy_cols
from fastDummies
library(fastDummies)
df1 %>%
dummy_cols('name')
-output
name name_Apple name_Banana name_Mango name_Orange
1 Apple 1 0 0 0
2 Apple 1 0 0 0
3 Mango 0 0 1 0
4 Banana 0 1 0 0
5 Banana 0 1 0 0
6 Orange 0 0 0 1
7 Mango 0 0 1 0
8 Orange 0 0 0 1
data
df1 <- structure(list(name = c("Apple", "Apple", "Mango", "Banana",
"Banana", "Orange", "Mango", "Orange")), class = "data.frame", row.names = c(NA,
-8L))
CodePudding user response:
In base R, you can do:
mat <- outer(df$name, unique(df$name), function(a, b) as.numeric(a == b))
cbind(df, setNames(as.data.frame(mat), paste0('name_', unique(df$name))))
#> name name_Apple name_Mango name_Banana name_Orange
#> 1 Apple 1 0 0 0
#> 2 Apple 1 0 0 0
#> 3 Mango 0 1 0 0
#> 4 Banana 0 0 1 0
#> 5 Banana 0 0 1 0
#> 6 Orange 0 0 0 1
#> 7 Mango 0 1 0 0
#> 8 Orange 0 0 0 1
CodePudding user response:
Another way:
model.matrix(~ name - 1, data = df)
# nameApple nameBanana nameMango nameOrange
# 1 1 0 0 0
# 2 1 0 0 0
# 3 0 0 1 0
# 4 0 1 0 0
# 5 0 1 0 0
# 6 0 0 0 1
# 7 0 0 1 0
# 8 0 0 0 1
data:
structure(list(name = c("Apple", "Apple", "Mango", "Banana",
"Banana", "Orange", "Mango", "Orange")), class = "data.frame", row.names = c(NA,
-8L)) -> df