I have quite an advanced data wrangling issue in R, I hope you can help me with it. I have a data frame with a column called "Markers", whereby I know three things:
- The amount of time between start_trial and tone_onset (variable amount of ms)
- The amount of time between tone_onset and stimulus_onset (500ms)
- The amount of time between stimulus_onset and end_trial (1000ms)
I want to create a column that keeps track of the elapsed time per trial. Unfortunately, the amount of rows between the markers is not consistent with the elapsed time. Therefore, what I want to do is to evenly divide the rows in the amount of ms they should consist of. For example, one trial might have 50 rows between tone_onset and stimulus_onset, and therefore each row should progress 10ms in trial time. Another trial might have 100 rows in between, and then each row should progress 5ms. Furthermore, I want to continue counting the elapsed time until the start of the next trial (so the time between end_trial and start_trial). On top, I want the counting for each trial to centre around stimulus_onset (so everything before counts in negative, everything after in positive). Lastly, I want to label the trial according to their trial numbers. Dataframes speak better than words, so here is a very simplified example:
df <- data.frame(Marker = c("start_trial", "", "", "start_tone", "", "", "", "", "start_stimulus", "", "", "", "", "", "", "end_trial", "", "start_trial", "", "", "", "start_tone", "", "", "", "start_stimulus", "", "", "", "end_trial", "", ""))
As said before, the time between tone_onset and stimulus_onset is always 500ms, and the time between stimulus_onset and end_trial is always 1000ms. The time between start_trial and tone_onset is variable however. I have a separate data frame with a list of the times between start_trial and tone_onset for each trial:
trial_interval <- (Trial_Interval = c("395", "505"))
What I want to end up with is the following:
df2 <- data.frame(Marker = c("start_trial", "", "", "start_tone", "", "", "", "", "start_stimulus", "", "", "", "", "", "", "end_trial", "", "start_trial", "", "", "", "start_tone", "", "", "", "start_stimulus", "", "", "", "end_trial", "", ""),
TrialTime = c(-895, -763.3, -631.7, -500, -400, -300, -200, -100, 0, 142.9, 285.7, 428.7, 571.4, 714.4, 857.3, 1000, 1142.9, -1005, -875.75, -752.5, -626.25, -500, -375, -250, -125, 0, 250, 500, 750, 1000, 1250, 1500),
Trial = c("Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2")
)
I tried to simplify this complex problem as best as I good. Let me know if I need to elaborate on something! Thanks a lot, I've been struggling with this for quite some time now.
CodePudding user response:
Step 1: Create a smaller frame from the original df
that contains information that can be leveraged to get the steps between Markers
data = left_join(
df %>% mutate(id=row_number()),
df %>% mutate(Marker = ifelse(Marker=="",NA,Marker)) %>%
mutate(id=row_number()) %>%
filter(!is.na(Marker)) %>%
mutate(Trial = cumsum(Marker=="start_trial"))
) %>%
fill(Trial) %>%
group_by(Trial) %>%
mutate(max_row = max(id)) %>%
filter(Marker!="") %>%
inner_join(tibble("interval" = as.numeric(Trial_Interval)) %>% mutate(Trial = row_number()), by="Trial")
Step 2: Create a function that can take each Trial-based subset of data
, and return the Trial time
f <- function(df,...) {
m = df[["Marker"]]
id = df[["id"]]
m_row = max(df[["max_row"]]) - id[4]
intv = unique(df[["interval"]])
n2 = seq(0, -500, length.out = id[3]-id[2] 1)
n1 = seq(-500,(-500-intv),length.out = id[2]-id[1] 1)
n3 = seq(0,1000, length.out=id[4]-id[3] 1)
if(m_row>0) n4=seq(1000,by=n3[2]-n3[1], length.out = m_row 1)
else n4=0
result = unique(c(n1,n2,n3,n4))
tibble(Trial_Time = result[order(result)])
}
Step 3. Apply that function to the groups of data
, and cbind
with the original frame
cbind(df, data %>% group_modify(f)) %>%
relocate(Marker, Trial_Time, Trial) %>%
mutate(Trial = paste0("Trial",Trial))
Output:
Marker Trial_Time Trial
1 start_trial -895.0000 Trial1
2 -763.3333 Trial1
3 -631.6667 Trial1
4 start_tone -500.0000 Trial1
5 -400.0000 Trial1
6 -300.0000 Trial1
7 -200.0000 Trial1
8 -100.0000 Trial1
9 start_stimulus 0.0000 Trial1
10 142.8571 Trial1
11 285.7143 Trial1
12 428.5714 Trial1
13 571.4286 Trial1
14 714.2857 Trial1
15 857.1429 Trial1
16 end_trial 1000.0000 Trial1
17 1142.8571 Trial1
18 start_trial -1005.0000 Trial2
19 -878.7500 Trial2
20 -752.5000 Trial2
21 -626.2500 Trial2
22 start_tone -500.0000 Trial2
23 -375.0000 Trial2
24 -250.0000 Trial2
25 -125.0000 Trial2
26 start_stimulus 0.0000 Trial2
27 250.0000 Trial2
28 500.0000 Trial2
29 750.0000 Trial2
30 end_trial 1000.0000 Trial2
31 1250.0000 Trial2
32 1500.0000 Trial2