Home > Blockchain >  Hieraching across rows for the same id
Hieraching across rows for the same id

Time:05-20

So, I have a data set with a lot of observations for X individuals and more rows per some individuals. For each row, I have assigned a classification (the variable clinical_significance) that takes three values in prioritized order: definite disease, possible, colonization. Now, I would like to have only one row for each individual and the "highest classification" across the rows, e.g. definite if present, subsidiary possible and colonization. Any good suggestions on how to overcome this?

For instance, as seen in the example, I would like all ID #23 clinical_signifiance to be 'definite disease' as this outranks 'possible'

id   id_row number_of_samples  species_ny   clinical_significa…
18     1         2                  MAC            possible           
18     2         2                  MAC            possible           
20     1         2                  scrofulaceum   possible           
20     2         2                  scrofulaceum   possible           
23     1         2                  MAC            possible           
23     2         2                  MAC            definite disease

CodePudding user response:

Making a reproducible example:

df <- structure(
  list(
    id = c("18", "18", "20", "20", "23", "23"),
    id_row = c("1","2", "1", "2", "1", "2"), 
    number_of_samples = c("2", "2", "2","2", "2", "2"), 
    species_ny = c("MAC", "MAC", "scrofulaceum", "scrofulaceum", "MAC", "MAC"), 
    clinical_significance = c("possible", "possible", "possible", "possible", "possible", "definite disease")
  ),
  row.names = c(NA, -6L), class = c("data.frame")
)

The idea is to turn clinical significance into a factor, which is stored as an integer instead of character (i.e. 1 = definite, 2 = possible, 3 = colonization). Then, for each ID, take the row with lowest number.

df_prio <- df |> 
  mutate(
    fct_clin_sig = factor(
      clinical_significance, 
      levels = c("definite disease", "possible", "colonization")
    )
  ) |> 
  group_by(id) |> 
  slice_min(fct_clin_sig)

CodePudding user response:

I fixed it using

df <- df %>% 
  group_by(id) %>% 
  mutate(clinical_significance_new = ifelse(any(clinical_significance == "definite disease"), "definite disease", as.character(clinical_significance)))
  •  Tags:  
  • r
  • Related