This is what my dataframe looks like
I have a dataframe of several columns and several rows per Participant_ID. I want to sum data for all lines of Participant_ID, to obtain one value per Participant_ID. The problem is that some columns are empty (all NAs), and for these columns I want to keep NA as a result. But when I sum with na.rm = T, it transforms the sum of NAs to 0.
I am using :
df = df %>%
group_by(Participant_ID) %>%
summarise(across(where(is.numeric), ~ sum(.x, na.rm = T)))
How can I exclude columns (after group_by) with only NAs ? Or filter columns (after group_by) that contain at least one numeric value ?
Thanks a lot for your help !!
CodePudding user response:
You just need the function inside across
to handle the NA column differently.
Suppose we have this data frame:
df <- data.frame(Participant_ID = c('A', 'A', 'A'),
a = 1:3, b = c(4, NA, 6), c = as.numeric(c(NA, NA, NA)))
Then we can do
library(dplyr)
df %>%
group_by(Participant_ID) %>%
summarise(across(where(is.numeric),
~ if(all(is.na(.x))) NA_real_ else sum(.x, na.rm = T)))
#> # A tibble: 1 x 4
#> Participant_ID a b c
#> <chr> <int> <dbl> <dbl>
#> 1 A 6 10 NA
Created on 2022-11-03 with reprex v2.0.2
CodePudding user response:
We may use fsum
from collapse
library(collapse)
fsum(df[-1], df$Participant_ID)
a b c
A 6 10 NA
data
df <- data.frame(Participant_ID = c('A', 'A', 'A'),
a = 1:3, b = c(4, NA, 6), c = as.numeric(c(NA, NA, NA)))