I have a dataframe that contains several columns.
I'm trying to get the number of row that are unique, shared by at least two columns and shared by all columns.
test=data.frame(
A=c("inactive","inactive","active","active"),
B=c("active","active","inactive","active"),
C=c("active","inactive","inactive","active")
)
I want to know number of row that at least one 'active', two 'active' and row where all are 'active'
So I tried this :
all <- filter(
test,A == "active" & B=="active" & C=="active")
Then I get the lenght of the dataframe. I can do it for other conditions (shared between A and B, B and C, A and C) but I wonder if there is a better way to compute this.
Thanks
CodePudding user response:
A possible solution:
library(tidyverse)
test=data.frame(
A=c("inactive","inactive","active","active"),
B=c("active","active","inactive","active"),
C=c("active","inactive","inactive","active")
)
test %>%
mutate(nActives = rowSums(. == "active")) %>%
group_by(nActives) %>%
summarise(nRows = n()) %>%
ungroup
#> # A tibble: 3 × 2
#> nActives nRows
#> <dbl> <int>
#> 1 1 2
#> 2 2 1
#> 3 3 1
CodePudding user response:
We may use a base R
solution
with(test, table(rowSums(test == 'active')))
1 2 3
2 1 1
To filter the data with at least 2 'active' per row
> subset(test, rowSums(test == 'active') >=2)
A B C
1 inactive active active
4 active active active