Evenly divide certain values depending on the number of rows in a group (R)-CodePudding

I have quite an advanced data wrangling issue in R, I hope you can help me with it. I have a data frame with a column called "Markers", whereby I know three things:

The amount of time between start_trial and tone_onset (variable amount of ms)
The amount of time between tone_onset and stimulus_onset (500ms)
The amount of time between stimulus_onset and end_trial (1000ms)

I want to create a column that keeps track of the elapsed time per trial. Unfortunately, the amount of rows between the markers is not consistent with the elapsed time. Therefore, what I want to do is to evenly divide the rows in the amount of ms they should consist of. For example, one trial might have 50 rows between tone_onset and stimulus_onset, and therefore each row should progress 10ms in trial time. Another trial might have 100 rows in between, and then each row should progress 5ms. Furthermore, I want to continue counting the elapsed time until the start of the next trial (so the time between end_trial and start_trial). On top, I want the counting for each trial to centre around stimulus_onset (so everything before counts in negative, everything after in positive). Lastly, I want to label the trial according to their trial numbers. Dataframes speak better than words, so here is a very simplified example:

df <- data.frame(Marker = c("start_trial", "", "", "start_tone", "", "", "", "", "start_stimulus", "", "", "", "", "", "", "end_trial", "", "start_trial", "", "", "", "start_tone", "", "", "", "start_stimulus", "", "", "", "end_trial", "", ""))

As said before, the time between tone_onset and stimulus_onset is always 500ms, and the time between stimulus_onset and end_trial is always 1000ms. The time between start_trial and tone_onset is variable however. I have a separate data frame with a list of the times between start_trial and tone_onset for each trial:

trial_interval <- (Trial_Interval = c("395", "505"))

What I want to end up with is the following:

df2 <- data.frame(Marker = c("start_trial", "", "", "start_tone", "", "", "", "", "start_stimulus", "", "", "", "", "", "", "end_trial", "", "start_trial", "", "", "", "start_tone", "", "", "", "start_stimulus", "", "", "", "end_trial", "", ""),
                    TrialTime = c(-895, -763.3, -631.7, -500, -400, -300, -200, -100, 0, 142.9, 285.7, 428.7, 571.4, 714.4, 857.3, 1000, 1142.9, -1005, -875.75, -752.5, -626.25, -500, -375, -250, -125, 0, 250, 500, 750, 1000, 1250, 1500),
                    Trial = c("Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2")
                    )

I tried to simplify this complex problem as best as I good. Let me know if I need to elaborate on something! Thanks a lot, I've been struggling with this for quite some time now.

CodePudding user response：

Step 1: Create a smaller frame from the original df that contains information that can be leveraged to get the steps between Markers

data = left_join(
  df %>% mutate(id=row_number()),
  df %>% mutate(Marker = ifelse(Marker=="",NA,Marker)) %>% 
    mutate(id=row_number()) %>% 
    filter(!is.na(Marker)) %>% 
    mutate(Trial = cumsum(Marker=="start_trial"))
) %>% 
  fill(Trial) %>% 
  group_by(Trial) %>% 
  mutate(max_row = max(id)) %>% 
  filter(Marker!="") %>% 
  inner_join(tibble("interval" = as.numeric(Trial_Interval)) %>% mutate(Trial = row_number()), by="Trial")

Step 2: Create a function that can take each Trial-based subset of data, and return the Trial time

f <- function(df,...) {
  m = df[["Marker"]]
  id = df[["id"]]
  m_row = max(df[["max_row"]]) - id[4]
  intv = unique(df[["interval"]])
  
  n2 = seq(0, -500, length.out = id[3]-id[2]   1)
  n1 = seq(-500,(-500-intv),length.out = id[2]-id[1] 1)
  n3 = seq(0,1000, length.out=id[4]-id[3] 1)
  if(m_row>0) n4=seq(1000,by=n3[2]-n3[1], length.out = m_row 1)
  else n4=0
  result = unique(c(n1,n2,n3,n4))
  tibble(Trial_Time = result[order(result)])
}

Step 3. Apply that function to the groups of data, and cbind with the original frame

cbind(df, data %>% group_modify(f)) %>% 
  relocate(Marker, Trial_Time, Trial) %>% 
  mutate(Trial = paste0("Trial",Trial))

Output:

           Marker Trial_Time  Trial
1     start_trial  -895.0000 Trial1
2                  -763.3333 Trial1
3                  -631.6667 Trial1
4      start_tone  -500.0000 Trial1
5                  -400.0000 Trial1
6                  -300.0000 Trial1
7                  -200.0000 Trial1
8                  -100.0000 Trial1
9  start_stimulus     0.0000 Trial1
10                  142.8571 Trial1
11                  285.7143 Trial1
12                  428.5714 Trial1
13                  571.4286 Trial1
14                  714.2857 Trial1
15                  857.1429 Trial1
16      end_trial  1000.0000 Trial1
17                 1142.8571 Trial1
18    start_trial -1005.0000 Trial2
19                 -878.7500 Trial2
20                 -752.5000 Trial2
21                 -626.2500 Trial2
22     start_tone  -500.0000 Trial2
23                 -375.0000 Trial2
24                 -250.0000 Trial2
25                 -125.0000 Trial2
26 start_stimulus     0.0000 Trial2
27                  250.0000 Trial2
28                  500.0000 Trial2
29                  750.0000 Trial2
30      end_trial  1000.0000 Trial2
31                 1250.0000 Trial2
32                 1500.0000 Trial2