Home > OS >  R: count pairs of strings in one column and reset count based on different column
R: count pairs of strings in one column and reset count based on different column

Time:06-11

I am trying to add a column to a df that holds a count of a pair of strings that are repeated over multiple rows. The count needs to reset based on a change in another column.

More specifically: I am trying to add trial numbers to a very large data frame. Each trial consists of 2 parts (show followed by point), show and point are each associated with a value, and there can be an arbitrary number of show/point values per trial. Each ID can have a different number of trials, but each trial will always have a show, followed by a point. This means each ID will have a different number of rows.

Sample data:

ID <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
TrialType <- c("Show", "Show", "Show", "Point", "Point", "Point", "Point", "Show", "Show", "Show", "Show", "Point", "Show", "Show", "Point", "Show", "Show", "Show", "Point", "Point", "Point", "Show", "Show", "Show", "Show", "Point", "Show", "Show", "Show", "Point", "Point", "Point")
Value <- c(0.52, 0.54, 0.55, 0.57, 0.58, 0.59,0.75,0.89,0.32,0.99,0.01,0.02,0.56,0.67,0.32,0.59,0.75,0.89,0.32,0.99,0.01,0.02,0.56,0.67,0.32,0.55, 0.57, 0.58, 0.59,0.75,0.89, 0.99)
df<-as.data.frame(c(ID, TrialType, Value))
TrialNumber<-c(1,1,1,1,1,1,1,2,2,2,2,2,3,3,3,1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,3) 
df.desired <- cbind(ID, TrialType, Value, TrialNumber)

I think I need have a loop that goes through ID, but this is too advanced for me to figure out. I am new to R and to stackoverflow. Thank you in advance for your help.

CodePudding user response:

Using tidyverse:

within each id, Check whether the current value is Point and the previous value is Show. If that is the case, start a new count.

 library(tidyverse)
 df %>%
     group_by(ID) %>%
     mutate(TrialNumber = TrialType == 'Show' &
                lag(TrialType, default = 'Point') == 'Point',
            TrialNumber = cumsum(TrialNumber))

   ID TrialType Value TrialNumber
1   1      Show  0.52           1
2   1      Show  0.54           1
3   1      Show  0.55           1
4   1     Point  0.57           1
5   1     Point  0.58           1
6   1     Point  0.59           1
7   1     Point  0.75           1
8   1      Show  0.89           2
9   1      Show  0.32           2
10  1      Show  0.99           2
11  1      Show  0.01           2
12  1     Point  0.02           2
13  1      Show  0.56           3
14  1      Show  0.67           3
15  1     Point  0.32           3
16  2      Show  0.59           1
17  2      Show  0.75           1
18  2      Show  0.89           1
19  2     Point  0.32           1
20  2     Point  0.99           1
21  2     Point  0.01           1
22  2      Show  0.02           2
23  2      Show  0.56           2
24  2      Show  0.67           2
25  2      Show  0.32           2
26  2     Point  0.55           2
27  2      Show  0.57           3
28  2      Show  0.58           3
29  2      Show  0.59           3
30  2     Point  0.75           3
31  2     Point  0.89           3
32  2     Point  0.99           3

CodePudding user response:

You can use rleid from data.table:

library(dplyr)
library(data.table)

df %>% 
  mutate(tmp = data.table::rleid(TrialType),
         tmp = ifelse(TrialType == "Point", tmp - 1, tmp)) %>% 
  group_by(ID) %>% 
  mutate(TrialNumber = data.table::rleid(tmp)) %>% 
  select(-tmp) %>%
  ungroup()

Which gives:

     ID TrialType Value TrialNumber
   <dbl> <chr>     <dbl>       <int>
 1     1 Show       0.52           1
 2     1 Show       0.54           1
 3     1 Show       0.55           1
 4     1 Point      0.57           1
 5     1 Point      0.58           1
 6     1 Point      0.59           1
 7     1 Point      0.75           1
 8     1 Show       0.89           2
 9     1 Show       0.32           2
10     1 Show       0.99           2
11     1 Show       0.01           2
12     1 Point      0.02           2
13     1 Show       0.56           3
14     1 Show       0.67           3
15     1 Point      0.32           3
16     2 Show       0.59           1
17     2 Show       0.75           1
18     2 Show       0.89           1
19     2 Point      0.32           1
20     2 Point      0.99           1
21     2 Point      0.01           1
22     2 Show       0.02           2
23     2 Show       0.56           2
24     2 Show       0.67           2
25     2 Show       0.32           2
26     2 Point      0.55           2
27     2 Show       0.57           3
28     2 Show       0.58           3
29     2 Show       0.59           3
30     2 Point      0.75           3
31     2 Point      0.89           3
32     2 Point      0.99           3
  • Related