I am trying to add a column to a df that holds a count of a pair of strings that are repeated over multiple rows. The count needs to reset based on a change in another column.
More specifically: I am trying to add trial numbers to a very large data frame. Each trial consists of 2 parts (show followed by point), show and point are each associated with a value, and there can be an arbitrary number of show/point values per trial. Each ID can have a different number of trials, but each trial will always have a show, followed by a point. This means each ID will have a different number of rows.
Sample data:
ID <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
TrialType <- c("Show", "Show", "Show", "Point", "Point", "Point", "Point", "Show", "Show", "Show", "Show", "Point", "Show", "Show", "Point", "Show", "Show", "Show", "Point", "Point", "Point", "Show", "Show", "Show", "Show", "Point", "Show", "Show", "Show", "Point", "Point", "Point")
Value <- c(0.52, 0.54, 0.55, 0.57, 0.58, 0.59,0.75,0.89,0.32,0.99,0.01,0.02,0.56,0.67,0.32,0.59,0.75,0.89,0.32,0.99,0.01,0.02,0.56,0.67,0.32,0.55, 0.57, 0.58, 0.59,0.75,0.89, 0.99)
df<-as.data.frame(c(ID, TrialType, Value))
TrialNumber<-c(1,1,1,1,1,1,1,2,2,2,2,2,3,3,3,1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,3)
df.desired <- cbind(ID, TrialType, Value, TrialNumber)
I think I need have a loop that goes through ID, but this is too advanced for me to figure out. I am new to R and to stackoverflow. Thank you in advance for your help.
CodePudding user response:
Using tidyverse:
within each id
, Check whether the current value is Point
and the previous value is Show
. If that is the case, start a new count.
library(tidyverse)
df %>%
group_by(ID) %>%
mutate(TrialNumber = TrialType == 'Show' &
lag(TrialType, default = 'Point') == 'Point',
TrialNumber = cumsum(TrialNumber))
ID TrialType Value TrialNumber
1 1 Show 0.52 1
2 1 Show 0.54 1
3 1 Show 0.55 1
4 1 Point 0.57 1
5 1 Point 0.58 1
6 1 Point 0.59 1
7 1 Point 0.75 1
8 1 Show 0.89 2
9 1 Show 0.32 2
10 1 Show 0.99 2
11 1 Show 0.01 2
12 1 Point 0.02 2
13 1 Show 0.56 3
14 1 Show 0.67 3
15 1 Point 0.32 3
16 2 Show 0.59 1
17 2 Show 0.75 1
18 2 Show 0.89 1
19 2 Point 0.32 1
20 2 Point 0.99 1
21 2 Point 0.01 1
22 2 Show 0.02 2
23 2 Show 0.56 2
24 2 Show 0.67 2
25 2 Show 0.32 2
26 2 Point 0.55 2
27 2 Show 0.57 3
28 2 Show 0.58 3
29 2 Show 0.59 3
30 2 Point 0.75 3
31 2 Point 0.89 3
32 2 Point 0.99 3
CodePudding user response:
You can use rleid
from data.table
:
library(dplyr)
library(data.table)
df %>%
mutate(tmp = data.table::rleid(TrialType),
tmp = ifelse(TrialType == "Point", tmp - 1, tmp)) %>%
group_by(ID) %>%
mutate(TrialNumber = data.table::rleid(tmp)) %>%
select(-tmp) %>%
ungroup()
Which gives:
ID TrialType Value TrialNumber
<dbl> <chr> <dbl> <int>
1 1 Show 0.52 1
2 1 Show 0.54 1
3 1 Show 0.55 1
4 1 Point 0.57 1
5 1 Point 0.58 1
6 1 Point 0.59 1
7 1 Point 0.75 1
8 1 Show 0.89 2
9 1 Show 0.32 2
10 1 Show 0.99 2
11 1 Show 0.01 2
12 1 Point 0.02 2
13 1 Show 0.56 3
14 1 Show 0.67 3
15 1 Point 0.32 3
16 2 Show 0.59 1
17 2 Show 0.75 1
18 2 Show 0.89 1
19 2 Point 0.32 1
20 2 Point 0.99 1
21 2 Point 0.01 1
22 2 Show 0.02 2
23 2 Show 0.56 2
24 2 Show 0.67 2
25 2 Show 0.32 2
26 2 Point 0.55 2
27 2 Show 0.57 3
28 2 Show 0.58 3
29 2 Show 0.59 3
30 2 Point 0.75 3
31 2 Point 0.89 3
32 2 Point 0.99 3