Combine row values based on multiple ID in R-CodePudding

I have dataframe df of which a sample is shown below. I need to combine the rows by study_id, lab_study_dt, and lab_study_time, and have NA and non-NA values across labs and detection limits on the same row grouped by those three key variables.

I've tried group by and summarise_all to do this but didn't get the outcome I'm looking for.

df %>%
    group_by(study_id,lab_study_dt,lab_study_time) %>%
    summarise_all(funs(toString(na.omit(.))))

study_id	lab_study_dt	lab_study_time	lab_polys	lab_lymphs	lab_mono	lab_eos	lab_basos	lab_bands	lab_wbc_count	lab_rbc_count	protein_limit_of_detection	lab_protein	gluc_limit_of_detection	lab_glucose
Jane	8/13/2011	0:12	NA	NA	NA	NA	NA	NA	1	NA	NA	NA	NA	NA
Jane	8/13/2011	0:12	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
Jane	3/4/2013	15:27	NA	60	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
Jane	3/4/2013	15:27	NA	NA	NA	NA	NA	NA	NA	10000	NA	NA	NA	NA
Jane	3/4/2013	15:27	8	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
Jane	3/4/2013	15:27	NA	NA	NA	NA	NA	1	NA	NA	NA	NA	NA	NA
Jane	3/4/2013	15:27	NA	NA	NA	NA	NA	NA	149	NA	NA	NA	NA	NA
Jane	3/4/2013	15:27	NA	NA	31	NA	NA	NA	NA	NA	1	56	NA	NA
George	4/20/2021	21:18	NA	60	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
George	4/20/2021	21:18	NA	NA	NA	NA	NA	NA	NA	10000	NA	NA	NA	NA
George	4/23/2021	15:27	8	NA	NA	NA	NA	NA	NA	NA	NA	NA	2	3
George	4/23/2021	12:27	NA	65	NA	NA	NA	NA	NA	NA	1	>10	NA	NA
George	4/23/2021	12:27	NA	NA	NA	NA	NA	1	149	NA	NA	NA	NA	NA
George	4/23/2021	12:27	NA	NA	31	NA	NA	NA	NA	56	NA	NA	NA	NA

The end dataframe would have one row per unique study_id, study date, and study time together, with all associated values along that row. So for example, entry Jane - 3/4/2013 - 15:27 would look like below:

study_id	lab_study_dt	lab_study_time	lab_polys	lab_lymphs	lab_mono	lab_eos	lab_basos	lab_bands	lab_wbc_count	lab_rbc_count	protein_limit_of_detection	lab_protein	gluc_limit_of_detection	lab_glucose
Jane	3/4/2013	15:27	8	60	31	NA	NA	1	149	1000	1	56	NA	NA

Thank you in advance

CodePudding user response：

We could group by the id/dt/time, then use tidyr::fill on all columns (ie everything()) to take any non-NA values and copy them first down through any NAs and then up though any NAs ("downup", my arbitrary choice). Finally, we can keep just the first slice within each group and then remove the grouping.

library(tidyverse)    
df %>%
  group_by(study_id,lab_study_dt,lab_study_time) %>%
  fill(everything(), .direction = "downup") %>%
  slice(1) %>%
  ungroup()

Is this the output you'd expect?

# A tibble: 5 × 15
  study_id lab_study_dt lab_study_time lab_polys lab_lymphs lab_mono lab_eos lab_basos lab_bands lab_wbc_count lab_rbc_count protein_limit_of_detection lab_protein gluc_limit_of_detecti… lab_glucose
  <chr>    <chr>        <chr>              <int>      <int>    <int> <lgl>   <lgl>         <int>         <int>         <int>                      <int> <chr>                        <int>       <int>
1 George   4/20/2021    21:18                 NA         60       NA NA      NA               NA            NA         10000                         NA NA                              NA          NA
2 George   4/23/2021    12:27                 NA         65       31 NA      NA                1           149            56                          1 >10                             NA          NA
3 George   4/23/2021    15:27                  8         NA       NA NA      NA               NA            NA            NA                         NA NA                               2           3
4 Jane     3/4/2013     15:27                  8         60       31 NA      NA                1           149         10000                          1 56                              NA          NA
5 Jane     8/13/2011    0:12                  NA         NA       NA NA      NA               NA             1            NA                         NA NA                              NA          NA