I'm currently doing research and collecting a ranked-choice data. Basically people choosing their preferences in a topic. E.g., people ranking their preference on fruits: orange, mango, apple, avocado
The clean data frame looks like this:
Fruits Color
1 orange;apple;banana;avocado blue;yellow;red;green
2 avocado;apple;banana;orange red;green;blue;yellow
3 apple;banana;orange;avocado yellow;red;green;blue
4 banana;orange;apple;avocado green;blue;red;yellow
5 apple;avocado;banana;orange yellow;blue;green;red
The first person put orange as their first preference, then apple, banana, and avocado as the last preference. and so on
Scoring: 1st preference = 4; 2nd preference = 3; 3rd preference = 2; 4th preference = 1
Desired result
apple avocado banana orange blue green red yellow
1 3 1 2 4 4 1 2 3
2 3 4 2 1 2 3 4 1
3 4 1 3 2 1 2 3 4
4 2 1 4 3 3 4 2 1
5 4 3 2 1 3 2 1 4
The part that I confused is to figure out how to give score for each column -> turn from semicolon separated string into column with numeric value. If I can pass this, I can create the desired output dataframe.
I've found pmr
package, but the documentation only a few. Moreover, that package is too advance. I don't really need that for current state, just need simple scores for each preferences
Please help me at the scoring stage
CodePudding user response:
Here's an approach that works using a few lapply()
and vapply()
calls, but will generalize to more columns.
library(tibble)
d <- tibble::tribble(
~Fruits, ~Color,
"orange;apple;banana;avocado", "blue;yellow;red;green",
"avocado;apple;banana;orange", "red;green;blue;yellow",
"apple;banana;orange;avocado", "yellow;red;green;blue",
"banana;orange;apple;avocado", "green;blue;red;yellow",
"apple;avocado;banana;orange", "yellow;blue;green;red"
)
x <- lapply(unname(d), \(col) {
l <- col %>% strsplit(";")
x <- l[[1]] %>% unique() %>% sort()
out <- lapply(x, \(x) {
vapply(l, FUN.VALUE = numeric(1), \(vec) which(rev(vec) == x))
})
names(out) <- x
as.data.frame(out)
})
do.call(cbind, x)
apple avocado banana orange blue green red yellow
1 3 1 2 4 4 1 2 3
2 3 4 2 1 2 3 4 1
3 4 1 3 2 1 2 3 4
4 2 1 4 3 3 4 2 1
5 4 3 2 1 3 2 1 4
CodePudding user response:
Here’s a tidyverse solution built on tidyr::separate_rows()
to break apart the items and dplyr::n() 1 - dplyr::row_number()
to get reversed ranks by person and category.
library(dplyr)
library(tidyr)
prefs %>%
mutate(id = row_number()) %>%
separate_rows(!id) %>%
pivot_longer(
!id,
names_to = "category",
values_to = "item"
) %>%
group_by(id, category) %>%
mutate(rank = n() 1 - row_number()) %>%
ungroup() %>%
arrange(category) %>%
pivot_wider(
id_cols = id,
names_from = item,
values_from = rank
)
# A tibble: 5 × 9
id blue yellow red green orange apple banana avocado
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 4 3 2 1 4 3 2 1
2 2 2 1 4 3 1 3 2 4
3 3 1 4 3 2 2 4 3 1
4 4 3 1 2 4 3 2 4 1
5 5 3 4 1 2 1 4 2 3