I am struggling with this task: I have this dataframe:
df <- structure(list(col1 = c("A", "A", "A", "B", "A", "A", "C", "A"
)), class = "data.frame", row.names = c(NA, -8L))
col1
1 A
2 A
3 A
4 B
5 A
6 A
7 C
8 A
I want to get the count of A
in the first sequence only.
The expected answer is 3!
Update: expected not working Output:
df %>%
summarise(first_sequence_A = sum(col1=="A"))
# not working because counting all A's
# resluting in:
first_sequence_A
1 6
expected:
first_sequence_A
1 3
I prefer a solution with dplyr
I have tried cumsum
, rle
, lag
... but I can't get it!
CodePudding user response:
Not sure what your ideal final output would look like, but maybe something like this?
Edit: probably a better and more succinct way to do this, but...
library(dplyr)
library(data.table)
df %>%
mutate(x = rleid(col1)) %>%
group_by(col1, x) %>%
tally() %>%
slice(1) %>%
filter(col1 == "A") %>%
summarize(first_sequence_A = n)
Gives us:
# A tibble: 1 x 2
col1 first_sequence_A
<chr> <int>
1 A 3
CodePudding user response:
We can use rle
from base R
with(rle(df$col1 == "A"), lengths[values][1])
[1] 3
Or in dplyr
syntax
df %>%
summarise(first_sequence_A = with(rle(col1 == "A"), lengths[values][1]))
first_sequence_A
1 3