I try to count the number of A's
within-group id.
df<- data.frame( id= c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3,3 ,3,3,4,4,4, 5,5,6,6), value= c(NA, NA,"A", "A", NA,NA,"A","A","B","A",NA,NA,"B","A","B","A", NA, NA,"B",NA, NA, NA,NA))
Desired Output
id value number_A
1 NA 2
1 NA 2
1 A 2
1 A 2
2 NA 3
2 NA 3
2 A 3
2 A 3
2 B 3
2 A 3
3 NA 2
3 NA 2
3 B 2
3 A 2
3 B 2
3 A 2
4 NA 0
4 NA 0
4 B 0
5 NA 0
5 NA 0
6 NA 0
6 NA 0
I try it with the following code:
library(dplyr)
df1 <- df %>% group_by(id) %>%
mutate(count = row_number(value=="A"))
CodePudding user response:
You could use
library(dplyr)
df %>%
group_by(id) %>%
mutate(number_A = sum(value == "A", na.rm = TRUE)) %>%
ungroup()
This returns
# A tibble: 23 x 3
id value number_A
<dbl> <chr> <int>
1 1 NA 2
2 1 NA 2
3 1 A 2
4 1 A 2
5 2 NA 3
6 2 NA 3
7 2 A 3
8 2 A 3
9 2 B 3
10 2 A 3
# ... with 13 more rows
CodePudding user response:
Base solution with the aggregate
function.
df<- data.frame( id= c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3,3 ,3,3,4,4,4, 5,5,6,6), value= c(NA, NA,"A", "A", NA,NA,"A","A","B","A",NA,NA,"B","A","B","A", NA, NA,"B",NA, NA, NA,NA))
# Calculate the number of A for each group id:
countA = aggregate(value ~ id, data=df, FUN=function(x){sum(x=="A", na.rm=TRUE))}, na.omit=na.pass)
countA
# id value
# 1 1 2
# 2 2 3
# 3 3 2
# 4 4 0
# 5 5 0
# 6 6 0
# Set the value in countA to "countA" and merge with df
names(countA)[2] = "countA"
merge(df, countA, by="id")
# id value countA
#1 1 <NA> 2
#2 1 <NA> 2
#3 1 A 2
#4 1 A 2
#5 2 <NA> 3
#6 2 <NA> 3
# ...
Explanation:
aggregate
calculates a summary function for groups defined by by
variable. Alternatively, this relationship can be provided with a formula.
The function(x){sum(x=="A", na.rm=TRUE)}
simply calculates a sum of A
values and removes the NA
s that would otherwise bubble up.
Finally, by default, aggregate removes the values with NA
s, which cause some groups to not be represented. This is fixed by the rule na.omit=na.pass
which suppress this behaviour.
After that, we just rename a column in our aggregated result and merge
the two data.frames
by the id
column.
CodePudding user response:
Another solution:
df %>%
add_count(id, wt = value=="A", name = "number_A")