Home > Software engineering >  How to summarize the numbers/populations that the data in two columns meet some conditions/criterion
How to summarize the numbers/populations that the data in two columns meet some conditions/criterion

Time:06-02

This is the sample data and test results:

tta <- data.frame(v1=c(8, 6, 1, 3, 8, 3, 3, 4, 5, 5, 7, 3, 4, 2, 8, 2, 2, 2, 5, 8, 4, 5, 3, 5, 3),
                  v2=c(9, 5, 3, 5, 4, 4, 8, 3, 1, 3, 3, 7, 7, 7, 9, 3, 7, 3, 3, 8, 4, 6, 3, 7, 5),
                  group=c(rep(c(1:5), each=5)))

## not perfect and need downstream analysis or merge
resulta <- tta %>%
    filter(v1<=6 & v2<=6) %>%
    group_by(group) %>%
    summarise(n=n(), frac=n/5)

## resulta
##     lost the group 3 that has no data meet the criterion that "v1<=6 & v2<=6"
## 
## # A tibble: 4 × 3
##   group     n  frac
##   <int> <int> <dbl>
## 1     1     3   0.6
## 2     2     4   0.8
## 3     4     3   0.6
## 4     5     4   0.8

## expect results
##
## # A tibble: 4 × 3
##   group     n  frac
##   <int> <int> <dbl>
## 1     1     3   0.6
## 2     2     4   0.8
## 3     3     0   0.0
## 4     4     3   0.6
## 5     5     4   0.8
##

There are two problems:

  1. Lost the group 3 that has no data meet the criterion ("v1<=6 & v2<=6") if you use filter first.
  2. frac=n/5: the population calculation is not perfect if group data is not 5 rows or random length.

Are there any solutions? Another method besides dplyr is also okay. Thanks for your help

CodePudding user response:

You may try,

tta %>%
  mutate(key = as.numeric(v1<=6 & v2<=6)) %>%
  group_by(group) %>%
  summarize(n = sum(key), frac = n/n())

  group     n  frac
  <int> <dbl> <dbl>
1     1     3   0.6
2     2     4   0.8
3     3     0   0  
4     4     3   0.6
5     5     4   0.8
  • Related