In this dataset DF, we have 4 names and 4 professions.
DF<-tribble(
~names, ~princess, ~singer, ~astronaut, ~painter,
"diana", 4, 1, 2, 3,
"shakira", 2, 1, 3, 4,
"armstrong", 3, 4, 1, 2,
"picasso", 1, 3, 1, 4
)
Assume that the cell values are some measure of their their profession. So, for instance, Diana has highest cell value for princess (correctly) but Shakira has highest cell value for painter (incorrectly).
I want to create two columns called "Compatible" and "Incompatible" where the program will pick value of 4 for Diana as it is under the correct profession Princess and assign it to column "Compatible" and in the "Incompatible" put an average of the other 3 values. For Shakira, it will pick the value 1 from the correct profession of singer, and assign it to Compatible; for Incompatible it average the other values. Similarly for other names
So the output will be like this:
DF1<-tribble(
~names, ~princess, ~singer, ~astronaut, ~painter,~Compatible,~Incompatible,
"diana", 4, 1, 2, 3, 4, 2,
"shakira", 2, 1, 3, 4, 1, 3,
"armstrong", 3, 4, 1, 2, 1, 3,
"picasso", 1, 3, 1, 4, 4, 1.66
)
Here is the dataset which shows the correct names and professions:
DF3<- tribble(
~names, ~professions,
"diana", "princess",
"shakira", "singer",
"armstrong", "astronaut",
"picasso", "painter"
)
CodePudding user response:
DF1[1:5] %>%
pivot_longer(-names) %>%
left_join(DF3, 'names') %>%
group_by(names, name = if_else(name == professions, 'compatible', 'incompatible')) %>%
summarise(profession = first(professions), value = mean(value), .groups = 'drop') %>%
pivot_wider()
# A tibble: 4 x 4
names profession compatible incompatible
<chr> <chr> <dbl> <dbl>
1 armstrong astronaut 1 3
2 diana princess 4 2
3 picasso painter 4 1.67
4 shakira singer 1 3