Home > database >  R Only Keep Rows up to a certin condition
R Only Keep Rows up to a certin condition

Time:11-09

I have a dataframe as follows

   head(data)

   subject block trial timeLeft timeRight stim1 stim2 Chosen
1        1    13     0        0         0     2     1      2
2        1    13     1        0         1     3     2      2
3        1    13     3        0         0     3     1      1
4        1    13     4        2         0     2     3      3
5        1    13     6        1         1     1     3      1
6        1    13     7        2         2     2     1      1
...   

     
454     1006    14     0        0         0     6     5      5
455     1006    14     1        0         0     6     4      6
456     1006    14     3        0         1     4     5      4
457     1006    14     4        1         1     4     5      4
458     1006    14     6        1         2     6     4      6
  

my objective is to group by subject and block and to only keep rows prior and including where both timeLeft and timeRight =0

in this case the output would be

       subject block trial timeLeft timeRight stim1 stim2 Chosen
    1        1    13     0        0         0     2     1      2
    2        1    13     1        0         1     3     2      2
    3        1    13     3        0         0     3     1      1
...
   454     1006    14     0        0         0     6     5      5
   455     1006    14     1        0         0     6     4      6

Thank you in advance!

here is the structure of the data

'data.frame':   64748 obs. of  8 variables:
 $ subject  : num  1 1 1 1 1 1 1 1 1 1 ...
 $ block    : int  13 13 13 13 13 13 13 13 13 13 ...
 $ trial    : int  0 1 3 4 6 7 9 10 12 13 ...
 $ timeLeft : int  0 0 0 2 1 2 2 1 3 4 ...
 $ timeRight: int  0 1 0 0 1 2 1 3 4 4 ...
 $ stim1    : int  2 3 3 2 1 2 2 3 2 2 ...
 $ stim2    : int  1 2 1 3 3 1 3 1 1 1 ...
 $ Chosen   : int  2 2 1 3 1 1 2 1 2 2 ...

CodePudding user response:

You may do this with the help of custom function -

library(dplyr)

select_rows <- function(timeLeft, timeRight) {
  inds <- which(timeLeft == 0 & timeRight == 0)
  if(length(inds) >= 2) inds[1]:inds[2]
  else 0
}


data %>%
  group_by(subject, block) %>%
  slice(select_rows(timeLeft, timeRight)) %>%
  ungroup

#   subject block trial timeLeft timeRight stim1 stim2 Chosen
#    <int> <int> <int>    <int>     <int> <int> <int>  <int>
#1       1    13     0        0         0     2     1      2
#2       1    13     1        0         1     3     2      2
#3       1    13     3        0         0     3     1      1
#4    1006    14     0        0         0     6     5      5
#5    1006    14     1        0         0     6     4      6

If the data is huge you may also do this with data.table -

library(data.table)
setDT(data)[, .SD[select_rows(timeLeft, timeRight)], .(subject, block)]

data

It is easier to help if you provide data in a reproducible format

data <- structure(list(subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1006L, 1006L, 
1006L, 1006L, 1006L), block = c(13L, 13L, 13L, 13L, 13L, 13L, 
14L, 14L, 14L, 14L, 14L), trial = c(0L, 1L, 3L, 4L, 6L, 7L, 0L, 
1L, 3L, 4L, 6L), timeLeft = c(0L, 0L, 0L, 2L, 1L, 2L, 0L, 0L, 
0L, 1L, 1L), timeRight = c(0L, 1L, 0L, 0L, 1L, 2L, 0L, 0L, 1L, 
1L, 2L), stim1 = c(2L, 3L, 3L, 2L, 1L, 2L, 6L, 6L, 4L, 4L, 6L
), stim2 = c(1L, 2L, 1L, 3L, 3L, 1L, 5L, 4L, 5L, 5L, 4L), Chosen = c(2L, 
2L, 1L, 3L, 1L, 1L, 5L, 6L, 4L, 4L, 6L)), class = "data.frame", row.names = 
c("1", "2", "3", "4", "5", "6", "454", "455", "456", "457", "458"))

CodePudding user response:

If you want to keep all rows before timeLeft and timeRight are 0, you can try this way.

Data

   subject block trial timeLeft timeRight stim1 stim2 Chosen
1        1    13     0        0         0     2     1      2
2        1    13     1        0         1     3     2      2
3        1    13     3        0         0     3     1      1
4        1    13     4        2         0     2     3      3
5        1    13     6        1         1     1     3      1
6        1    13     7        2         2     2     1      1
7     1006    14     0        0         1     6     5      5
8     1006    14     0        0         0     6     5      5
9     1006    14     1        0         0     6     4      6
10    1006    14     3        0         1     4     5      4
11    1006    14     4        1         1     4     5      4
12    1006    14     6        1         2     6     4      6

I add one more row for subject:1006, to make first row is not 0,0.

Code

df %>%
  group_by(subject) %>%
  mutate(key = max(which((timeLeft == 0 & timeRight ==0)))) %>%
  slice(1:key)

  subject block trial timeLeft timeRight stim1 stim2 Chosen   key
    <int> <int> <int>    <int>     <int> <int> <int>  <int> <int>
1       1    13     0        0         0     2     1      2     3
2       1    13     1        0         1     3     2      2     3
3       1    13     3        0         0     3     1      1     3
4    1006    14     0        0         1     6     5      5     3
5    1006    14     0        0         0     6     5      5     3
6    1006    14     1        0         0     6     4      6     3

CodePudding user response:

You can filter for only rows that meet the condition and then group

data %>% 
    filter(timeLeft > 0  & timeRight > 0) %>%
    group_by(subject, block)
  • Related