Filtering hierarchically ordered comma separated text in a row in R-CodePudding

Here is an example data frame:

df <- data.frame ("user.id" = c("usr1", "usr2", "usr3", "usr4"),
                  "gender" = c("m", "f", "m", "f"),
                  "fav.car.rank" = c("toyota, subaru, honda", "honda, subaru, toyota", "toyota, honda, subaru", "subaru, toyota, honda"))

I have a column called “fav.car.rank”. The car brands are comma separated and in an hierarchical order (from most fav to least fav. // e.g. usr1’s most fav. brand is toyota).

I want to create separate columns for each car brand which includes the ranking of that car brand. I want a data frame that looks like this:

user.id	gender	toyota	subaru	honda
usr1	m	3	2	1
usr2	f	1	2	3
usr3	m	3	1	2
usr4	f	2	3	1

Thank you!

CodePudding user response：

An easier option is to split at the , to expand the rows with separate_rows, create the sequence column by reversed after grouping by 'user.id' and then reshape to 'wide' format with pivot_wider

library(dplyr)
library(tidyr)
df %>% 
 separate_rows(fav.car.rank) %>% 
 group_by(user.id) %>%
 mutate(n = rev(row_number())) %>%
 ungroup %>%
 pivot_wider(names_from = fav.car.rank, values_from = n)

-output

# A tibble: 4 × 5
  user.id gender toyota subaru honda
  <chr>   <chr>   <int>  <int> <int>
1 usr1    m           3      2     1
2 usr2    f           1      2     3
3 usr3    m           3      1     2
4 usr4    f           2      3     1