Home > Software engineering >  Creating a factor from data across several columns, with priorities, R
Creating a factor from data across several columns, with priorities, R

Time:10-08

I have a dataset similar to this:

df<-structure(list(Person = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), Stab = c(1, 
0, 1, 0, 0, 1, 0, 0, 0, 0), Shot = c(0, 0, 1, 1, 0, 0, 0, 0, 
0, 1), Cut = c(0, 1, 1, 1, 0, 0, 0, 1, 0, 1), ShotBow = c(0, 
0, 1, 0, 1, 0, 0, 0, 0, 0), Punched = c(0, 0, 1, 0, 1, 0, 0, 
1, 0, 0), Slapped = c(0, 0, 1, 0, 0, 1, 0, 0, 1, 0), `Car Accident` = c(0, 
0, 1, 0, 0, 0, 0, 0, 0, 0), `Bicycle Accident` = c(0, 0, 1, 0, 
0, 0, 1, 0, 0, 0), FellOver = c(0, 0, 1, 0, 0, 0, 1, 0, 1, 0)), spec = structure(list(
    cols = list(Person = structure(list(), class = c("collector_double", 
    "collector")), Stab = structure(list(), class = c("collector_double", 
    "collector")), Shot = structure(list(), class = c("collector_double", 
    "collector")), Cut = structure(list(), class = c("collector_double", 
    "collector")), ShotBow = structure(list(), class = c("collector_double", 
    "collector")), Punched = structure(list(), class = c("collector_double", 
    "collector")), Slapped = structure(list(), class = c("collector_double", 
    "collector")), `Car Accident` = structure(list(), class = c("collector_double", 
    "collector")), `Bicycle Accident` = structure(list(), class = c("collector_double", 
    "collector")), FellOver = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x000002898df11210>, row.names = c(NA, 
-10L), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
))

As you can see, the data is about different patients and what happened to them. Variables in the real dataset are slightly different but kinda like: "stabbed", "shot", "slapped" etc...

I want to squish all these columns into one column that is effectively "how bad is the injury", and working with some of my medical colleagues we've decided on some rankings (again, these aren't the real injuries, I made these ones up).

The rankings for this fake one are:

Level 1 Severity (worst)

  • Car Accident
  • Shot
  • ShotBow

Level 2 Severity (not as bad)

  • Stab
  • Cut
  • Bicycle Accident

Level 3 (really not bad)

  • Punched
  • Slapped
  • Fell Over

What I want to do is create a variable called "Severity" and give patients a 1,2,or 3 based on if they had that respective column (prioritizing the most severe injury). I.e. Patient 1 was stabbed, so they get a "2" (for level 2). Patient 8 was cut and slapped, so they'd get a 2 for the cut... which would overrule injuries less severe such as the slap. Patient 10 was shot and cut, so they'd get a "1" because shot is more severe than cut.

My expected output would look like this: enter image description here

CodePudding user response:

We may need a named vector

library(dplyr)
library(purrr)
nm1 <- setNames(c(2, 1, 2, 1, 3, 3, 1, 2, 3), names(df)[-1])
df %>%
     mutate(Severity = across(-Person, ~ na_if(., 0) * nm1[[cur_column()]])  %>% 
           {invoke(pmin, c(., na.rm = TRUE))})

-output

# A tibble: 10 × 11
   Person  Stab  Shot   Cut ShotBow Punched Slapped `Car Accident` `Bicycle Accident` FellOver Severity
    <dbl> <dbl> <dbl> <dbl>   <dbl>   <dbl>   <dbl>          <dbl>              <dbl>    <dbl>    <dbl>
 1      1     1     0     0       0       0       0              0                  0        0        2
 2      2     0     0     1       0       0       0              0                  0        0        2
 3      3     1     1     1       1       1       1              1                  1        1        1
 4      4     0     1     1       0       0       0              0                  0        0        1
 5      5     0     0     0       1       1       0              0                  0        0        1
 6      6     1     0     0       0       0       1              0                  0        0        2
 7      7     0     0     0       0       0       0              0                  1        1        2
 8      8     0     0     1       0       1       0              0                  0        0        2
 9      9     0     0     0       0       0       1              0                  0        1        3
10     10     0     1     1       0       0       0              0                  0        0        1
  •  Tags:  
  • r
  • Related