Home > Software design >  Calculating column means of levels in a data frame with NAs
Calculating column means of levels in a data frame with NAs

Time:12-28

Somehow I only get means of a few columns then the rest come back as NAs. I tried to use is.na=T and omit.na = T but I am getting the same wrong results. I tried these codes;

library(dplyr)

Test1162019 <- try1 %>% 
  filter(DateT == "11/6/2019") %>% 
    summarise(across(where(is.numeric), ~ mean(.x, na.omit = TRUE)))` 

IDs DateT   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R   S
A1  7/5/2019    1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
A2  7/5/2019    1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
A3  7/5/2019    5   5   5   6   7   7   8   9   10  10  11  12  12  13  14  15  15  15  15
A4  7/5/2019    33  34  34  35  35  36  37  37  38  38  38  38  38  38  38  38  38  38  38
A5  7/5/2019    3   3   3   NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
A6  7/5/2019    21  21  21  21  21  21  21  21  21  21  21  21  21  21  21  21  21  21  21
A7  7/5/2019    34  35  35  36  36  36  36  36  37  37  38  38  38  39  39  39  40  40  41
B1  8/5/2019    65  65  65  66  66  67  67  67  67  67  67  68  68  69  69  70  71  71  72
B2  8/5/2019    1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
B3  8/5/2019    9   9   9   9   9   9   9   9   9   9   9   9   9   9   10  10  11  12  13
B3  8/5/2019    11  11  11  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
B5  8/5/2019    2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
B6  8/5/2019    21  22  23  24  24  25  26  27  28  28  29  30  31  32  32  33  33  33  33
B7  8/5/2019    8   8   9   10  11  11  12  13  13  14  15  15  16  17  17  18  18  19  19
B8  8/5/2019    1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
B9  8/5/2019    47  47  47  47  47  47  47  47  47  47  48  49  49  50  51  51  52  53  53
B10 8/5/2019    67  67  68  68  69  69  70  71  72  72  73  73  73  73  73  73  73  73  73
B11 8/5/2019    38  38  38  38  38  38  38  39  39  39  39  40  40  40  40  40  41  41  41
C1  11/6/2019   71  71  71  71  71  71  71  71  71  71  71  71  71  71  71  71  71  71  71
C2  11/6/2019   38  39  40  41  42  43  43  44  44  44  45  45  NA  NA  NA  NA  NA  NA  NA
C3  11/6/2019   40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40  40
C4  11/6/2019   2   2   2   2   3   3   3   3   3   3   3   3   3   3   3   3   3   3   3
C5  15/5/2020   42  42  43  44  45  45  46  46  46  46  46  46  NA  NA  NA  NA  NA  NA  NA
D1  15/5/2020   41  41  42  43  43  44  44  45  45  46  47  47  48  49  50  51  51  51  52
D2  15/5/2020   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
D3  15/5/2020   46  46  46  46  47  47  47  47  47  47  48  48  49  50  50  51  51  51  51
D4  15/5/2020   37  37  37  37  37  37  37  37  37  37  37  37  37  37  37  37  37  37  37
D5  15/5/2020   31  31  31  31  31  31  32  32  33  34  34  34  34  34  34  34  35  35  35
D6  15/5/2020   37  37  38  39  40  40  41  42  43  44  44  45  46  47  48  48  48  49  49

How do I solve this problem?

CodePudding user response:

I'm not a dplyr expert but the base R way to signal to mean to remove (ignore)NAs is na.rm = TRUE.

CodePudding user response:

You may use colMeans on a subset s.

s <- c("D", "E", "F")  ## selected columns
colMeans(dat[s], na.rm=TRUE)
#        D        E        F 
# 28.22222 28.51852 28.70370 

Or, use the pipe.

dat[s] |> 
  colMeans(na.rm=TRUE)
#        D        E        F 
# 28.22222 28.51852 28.70370 

  • Related