I am working with the following dataframe:
cell_a cell_b cell_c group
N/A 2.5 5 A
1.2 3.6 N/A A
3 2.1 3.2 A
N/A N/A 1 B
1.2 N/A N/A B
2 N/A N/A B
I would like to count the number of rows with a non-null value in each column and summarise it by group.
The outcome should be stored in new dataframe, such as:
cell_a cell_b cell_c group
2 3 2 A
2 0 1 B
I tried with:
df_2 <- aggregate(df[1:3], list(df$group), length)
but it's indeed giving me the total length of each rows for each group. I've also tried adding na.action = na.omit
or na.rm=TRUE
after length
but doesn't work.
What else can I use in this code to ignore the N/A values?
Thank you very much in advance for your help!
CodePudding user response:
With aggregate
, use sum
of non-NA elements (assuming the missing value is NA
) as length
returns the total number of elements (per group as we are grouping by group)
aggregate(. ~ group, df, FUN = function(x) sum(!is.na(x)), na.action = NULL)
If the NA
value is a string element "N/A"
aggregate(. ~ group, df, FUN = function(x) sum(x != "N/A"), na.action = NULL)
group cell_a cell_b cell_c
1 A 2 3 2
2 B 2 0 1
data
df <- structure(list(cell_a = c("N/A", "1.2", "3", "N/A", "1.2", "2"
), cell_b = c("2.5", "3.6", "2.1", "N/A", "N/A", "N/A"), cell_c = c("5",
"N/A", "3.2", "1", "N/A", "N/A"), group = c("A", "A", "A", "B",
"B", "B")), class = "data.frame", row.names = c(NA, -6L))
CodePudding user response:
Here is how we could do it with dplyr
:
- change
N\A
notNA
withna_if
andacross
- group
- summarise with
across
library(dplyr)
df %>%
mutate(across(starts_with("cell"), ~na_if(., "N/A"))) %>%
group_by(group) %>%
summarise(across(starts_with("cell"), ~sum(!is.na(.))))
group cell_a cell_b cell_c
<chr> <int> <int> <int>
1 A 2 3 2
2 B 2 0 1