Here is an example data frame:
df <- data.frame ("user.id" = c("usr1", "usr2", "usr3", "usr4"),
"gender" = c("m", "f", "m", "f"),
"fav.car.rank" = c("toyota, subaru, honda", "honda, subaru, toyota", "toyota, honda, subaru", "subaru, toyota, honda"))
I have a column called “fav.car.rank”. The car brands are comma separated and in an hierarchical order (from most fav to least fav. // e.g. usr1’s most fav. brand is toyota).
I want to create separate columns for each car brand which includes the ranking of that car brand. I want a data frame that looks like this:
user.id | gender | toyota | subaru | honda |
---|---|---|---|---|
usr1 | m | 3 | 2 | 1 |
usr2 | f | 1 | 2 | 3 |
usr3 | m | 3 | 1 | 2 |
usr4 | f | 2 | 3 | 1 |
Thank you!
CodePudding user response:
An easier option is to split at the ,
to expand the rows with separate_rows
, create the sequence column by rev
ersed after grouping by 'user.id' and then reshape to 'wide' format with pivot_wider
library(dplyr)
library(tidyr)
df %>%
separate_rows(fav.car.rank) %>%
group_by(user.id) %>%
mutate(n = rev(row_number())) %>%
ungroup %>%
pivot_wider(names_from = fav.car.rank, values_from = n)
-output
# A tibble: 4 × 5
user.id gender toyota subaru honda
<chr> <chr> <int> <int> <int>
1 usr1 m 3 2 1
2 usr2 f 1 2 3
3 usr3 m 3 1 2
4 usr4 f 2 3 1