Home > Mobile >  Combine row values based on multiple ID in R
Combine row values based on multiple ID in R

Time:07-10

I have dataframe df of which a sample is shown below. I need to combine the rows by study_id, lab_study_dt, and lab_study_time, and have NA and non-NA values across labs and detection limits on the same row grouped by those three key variables.

I've tried group by and summarise_all to do this but didn't get the outcome I'm looking for.

df %>%
    group_by(study_id,lab_study_dt,lab_study_time) %>%
    summarise_all(funs(toString(na.omit(.))))
study_id lab_study_dt lab_study_time lab_polys lab_lymphs lab_mono lab_eos lab_basos lab_bands lab_wbc_count lab_rbc_count protein_limit_of_detection lab_protein gluc_limit_of_detection lab_glucose
Jane 8/13/2011 0:12 NA NA NA NA NA NA 1 NA NA NA NA NA
Jane 8/13/2011 0:12 NA NA NA NA NA NA NA NA NA NA NA NA
Jane 3/4/2013 15:27 NA 60 NA NA NA NA NA NA NA NA NA NA
Jane 3/4/2013 15:27 NA NA NA NA NA NA NA 10000 NA NA NA NA
Jane 3/4/2013 15:27 8 NA NA NA NA NA NA NA NA NA NA NA
Jane 3/4/2013 15:27 NA NA NA NA NA 1 NA NA NA NA NA NA
Jane 3/4/2013 15:27 NA NA NA NA NA NA 149 NA NA NA NA NA
Jane 3/4/2013 15:27 NA NA 31 NA NA NA NA NA 1 56 NA NA
George 4/20/2021 21:18 NA 60 NA NA NA NA NA NA NA NA NA NA
George 4/20/2021 21:18 NA NA NA NA NA NA NA 10000 NA NA NA NA
George 4/23/2021 15:27 8 NA NA NA NA NA NA NA NA NA 2 3
George 4/23/2021 12:27 NA 65 NA NA NA NA NA NA 1 >10 NA NA
George 4/23/2021 12:27 NA NA NA NA NA 1 149 NA NA NA NA NA
George 4/23/2021 12:27 NA NA 31 NA NA NA NA 56 NA NA NA NA

The end dataframe would have one row per unique study_id, study date, and study time together, with all associated values along that row. So for example, entry Jane - 3/4/2013 - 15:27 would look like below:

study_id lab_study_dt lab_study_time lab_polys lab_lymphs lab_mono lab_eos lab_basos lab_bands lab_wbc_count lab_rbc_count protein_limit_of_detection lab_protein gluc_limit_of_detection lab_glucose
Jane 3/4/2013 15:27 8 60 31 NA NA 1 149 1000 1 56 NA NA

Thank you in advance

CodePudding user response:

We could group by the id/dt/time, then use tidyr::fill on all columns (ie everything()) to take any non-NA values and copy them first down through any NAs and then up though any NAs ("downup", my arbitrary choice). Finally, we can keep just the first slice within each group and then remove the grouping.

library(tidyverse)    
df %>%
  group_by(study_id,lab_study_dt,lab_study_time) %>%
  fill(everything(), .direction = "downup") %>%
  slice(1) %>%
  ungroup()

Is this the output you'd expect?

# A tibble: 5 × 15
  study_id lab_study_dt lab_study_time lab_polys lab_lymphs lab_mono lab_eos lab_basos lab_bands lab_wbc_count lab_rbc_count protein_limit_of_detection lab_protein gluc_limit_of_detecti… lab_glucose
  <chr>    <chr>        <chr>              <int>      <int>    <int> <lgl>   <lgl>         <int>         <int>         <int>                      <int> <chr>                        <int>       <int>
1 George   4/20/2021    21:18                 NA         60       NA NA      NA               NA            NA         10000                         NA NA                              NA          NA
2 George   4/23/2021    12:27                 NA         65       31 NA      NA                1           149            56                          1 >10                             NA          NA
3 George   4/23/2021    15:27                  8         NA       NA NA      NA               NA            NA            NA                         NA NA                               2           3
4 Jane     3/4/2013     15:27                  8         60       31 NA      NA                1           149         10000                          1 56                              NA          NA
5 Jane     8/13/2011    0:12                  NA         NA       NA NA      NA               NA             1            NA                         NA NA                              NA          NA
  • Related