Home > Software design >  R Dplyr summarize rows except when only NAs
R Dplyr summarize rows except when only NAs

Time:11-04

This is what my dataframe looks like

I have a dataframe of several columns and several rows per Participant_ID. I want to sum data for all lines of Participant_ID, to obtain one value per Participant_ID. The problem is that some columns are empty (all NAs), and for these columns I want to keep NA as a result. But when I sum with na.rm = T, it transforms the sum of NAs to 0.

I am using :

 df = df %>%
  group_by(Participant_ID) %>%
  summarise(across(where(is.numeric), ~ sum(.x, na.rm = T)))

How can I exclude columns (after group_by) with only NAs ? Or filter columns (after group_by) that contain at least one numeric value ?

Thanks a lot for your help !!

CodePudding user response:

You just need the function inside across to handle the NA column differently.

Suppose we have this data frame:

df <- data.frame(Participant_ID = c('A', 'A', 'A'),
  a = 1:3, b = c(4, NA, 6), c = as.numeric(c(NA, NA, NA)))

Then we can do

library(dplyr)

df %>%
  group_by(Participant_ID) %>%
  summarise(across(where(is.numeric), 
                   ~ if(all(is.na(.x))) NA_real_ else sum(.x, na.rm = T)))
#> # A tibble: 1 x 4
#>   Participant_ID     a     b     c
#>   <chr>          <int> <dbl> <dbl>
#> 1 A                  6    10    NA

Created on 2022-11-03 with reprex v2.0.2

CodePudding user response:

We may use fsum from collapse

library(collapse)
 fsum(df[-1], df$Participant_ID)
  a  b  c
A 6 10 NA

data

df <- data.frame(Participant_ID = c('A', 'A', 'A'),
  a = 1:3, b = c(4, NA, 6), c = as.numeric(c(NA, NA, NA)))
  • Related