Home > OS >  Average multiple columns within groups where some values are missing
Average multiple columns within groups where some values are missing

Time:04-14

If this is my dataset

 Id   Date       time     Col1   Col2   Col3
 1    1/15/2001  1       -1.24  -0.23  -8.517
 1    4/12/2001  2       -4.35   0.45  -1.924
 2   10/18/2018  1       -6.11  -1.90  -4.465
 2   10/18/2018  1       -3.18   5.69  -5.46
 3   02/12/2017  2       -2.03          6.656
 3   02/12/2017  2               3.78   1.747
 3   29/07/2019  3       -7.418 -5.843 -3.785

My goal is to keep only rows that are unique based on :ID Date Time For same ID,Date and Time, take the average of Col1,Col2,Col3. The final dataset should look like this

Id   Date       time     Col1   Col2   Col3
 1    1/15/2001  1       -1.24  -0.23  -8.51
 1    4/12/2001  2       -4.35   0.45  -1.92
 
 2   10/18/2018  1        4.64   1.89  -4.96

 3   02/12/2017  2       -2.03   3.78   4.20

 3   29/07/2019  3       -7.418 -5.843 -3.785

Thanks in advance of any suggestions.

CodePudding user response:

First group and then use across with na.rm=TRUE:

library(dplyr)

df %>% 
  group_by(Id, Date, time) %>% 
  summarise(across(starts_with("col"), ~mean(., na.rm = TRUE)))
     Id Date        time  Col1  Col2  Col3
  <int> <chr>      <int> <dbl> <dbl> <dbl>
1     1 1/15/2001      1 -1.24 -0.23 -8.52
2     1 4/12/2001      2 -4.35  0.45 -1.92
3     2 10/18/2018     1 -4.65  1.90 -4.96
4     3 02/12/2017     2 -2.03  3.78  4.20
5     3 29/07/2019     3 -7.42 -5.84 -3.78

CodePudding user response:

If you want to use base R:

> aggregate(.~Id Date time, dat, mean, na.rm=TRUE, na.action = NULL)

  Id       Date time   Col1   Col2    Col3
1  1  1/15/2001    1 -1.240 -0.230 -8.5170
2  2 10/18/2018    1 -4.645  1.895 -4.9625
3  3 02/12/2017    2 -2.030  3.780  4.2015
4  1  4/12/2001    2 -4.350  0.450 -1.9240
5  3 29/07/2019    3 -7.418 -5.843 -3.7850

  • Related