This seems like a really basic question but I can't find a solution that will do what I want for all columns of a dataframe.
I have a dataframe:
df = data.frame(cats = c("A", "B", "C", NA, NA), dogs = c(-99, "F", NA, -99, "H"))
Where I want to count the number of times NA occurs within each column. I also want to count the number of times -99 occurs within each column. I am able to use summarise_all to count the number of NAs per column.
df %>% summarise_all(~ sum(is.na(.)))
Which produces the desired result:
cats dogs
2 1
But I can't figure out how to adapt this to count the number of times -99 appears per column. I've tried the following:
df %>% summarise_all(~ sum(-99))
Which produces this result:
cats dogs
-99 -99
This result shows -99 for each column, even though it never occurs within cats, and it doesn't produce the number of times -99 occurs. There must be an easy way to do this? Thanks for any help!
CodePudding user response:
You almost get there, you need to use na.rm = TRUE
inside sum
> df %>% summarise_all(~ sum(.== -99, na.rm = TRUE))
cats dogs
1 0 2
CodePudding user response:
Using base R
colSums(df == -99, na.rm = TRUE)
cats dogs
0 2