Count certain values of column in group id-CodePudding

I try to count the number of A's within-group id.

df<- data.frame( id= c(1, 1, 1, 1,  2, 2, 2, 2,  2, 2, 3, 3, 3,3 ,3,3,4,4,4, 5,5,6,6), value= c(NA, NA,"A", "A", NA,NA,"A","A","B","A",NA,NA,"B","A","B","A", NA, NA,"B",NA, NA, NA,NA))

Desired Output

 id value number_A
  1   NA        2
  1   NA        2
  1     A        2
  1     A        2
  2   NA        3
  2   NA        3
  2     A        3
  2     A        3
  2     B        3
  2     A        3
  3   NA         2
  3   NA         2
  3     B        2
  3     A        2
  3     B        2
  3     A        2
  4   NA         0
  4   NA         0
  4     B        0
  5   NA         0
  5   NA         0
  6   NA         0
  6   NA         0

I try it with the following code:

library(dplyr)
df1 <- df %>% group_by(id) %>% 
  mutate(count =  row_number(value=="A"))

CodePudding user response：

You could use

library(dplyr)

df %>% 
  group_by(id) %>% 
  mutate(number_A = sum(value == "A", na.rm = TRUE)) %>% 
  ungroup()

This returns

# A tibble: 23 x 3
      id value number_A
   <dbl> <chr>    <int>
 1     1 NA           2
 2     1 NA           2
 3     1 A            2
 4     1 A            2
 5     2 NA           3
 6     2 NA           3
 7     2 A            3
 8     2 A            3
 9     2 B            3
10     2 A            3
# ... with 13 more rows

CodePudding user response：

Base solution with the aggregate function.

df<- data.frame( id= c(1, 1, 1, 1,  2, 2, 2, 2,  2, 2, 3, 3, 3,3 ,3,3,4,4,4, 5,5,6,6), value= c(NA, NA,"A", "A", NA,NA,"A","A","B","A",NA,NA,"B","A","B","A", NA, NA,"B",NA, NA, NA,NA))

# Calculate the number of A for each group id:
countA = aggregate(value ~ id, data=df, FUN=function(x){sum(x=="A", na.rm=TRUE))}, na.omit=na.pass)
countA
#   id value
# 1  1     2
# 2  2     3
# 3  3     2
# 4  4     0
# 5  5     0
# 6  6     0

# Set the value in countA to "countA" and merge with df
names(countA)[2] = "countA"
merge(df, countA, by="id")
#   id value countA
#1   1  <NA>      2
#2   1  <NA>      2
#3   1     A      2
#4   1     A      2
#5   2  <NA>      3
#6   2  <NA>      3
# ...

Explanation:

aggregate calculates a summary function for groups defined by by variable. Alternatively, this relationship can be provided with a formula.

The function(x){sum(x=="A", na.rm=TRUE)} simply calculates a sum of A values and removes the NAs that would otherwise bubble up.

Finally, by default, aggregate removes the values with NAs, which cause some groups to not be represented. This is fixed by the rule na.omit=na.pass which suppress this behaviour.

After that, we just rename a column in our aggregated result and merge the two data.frames by the id column.

CodePudding user response：

Another solution:

df %>% 
  add_count(id, wt = value=="A", name = "number_A")