I have a dataframe that looks like this:
channel start.time stop.time vp duration X id overlaps
1: 4_speech 14.183 16.554 CH1 2.371 NA 165 1_hands_CH1_1.145;2_body_CH1_1.883;2_body_N_14.272;2_body_N_4.825
2: 4_speech 21.196 22.259 CH1 1.063 NA 166 1_hands_N_1.417;2_body_CH1_1.075;2_body_N_5.485
3: 4_speech 28.001 31.518 CH1 3.517 NA 167 1_hands_CH1_3.557;1_hands_CH2_1.75;2_body_CH1_3.445;2_body_N_3.519
4: 4_speech 34.867 36.549 CH1 1.682 NA 168 2_body_CH1_3.308
5: 4_speech 41.019 42.265 CH1 1.246 NA 169 1_hands_CH1_4.896;1_hands_N_0.663;2_body_CH1_5.288
6: 4_speech 55.262 57.800 CH1 2.538 NA 170 2_body_CH1_2.494;2_body_N_6.571
The first 6 columns show information about a particular observation, the 7th column 'overlaps' shows a list of other observations from a different data-frame that co-occur with the observations in this data frame. Each of the observations in the overlaps column is structured like this: 'channel_vp_duration'. For example, the first observation in 'overlaps' in row 1 shows that '1_hands' is the channel, 'CH1' is vp (a kind of value), and the 1.145 the duration of that observation.
I want a sum of all the durations for a given observation type. I can sort of get this with the following code that was adapted from an answer provided by a stack user on a question I previously asked about how to get the overlaps data in the first place.
library(data.table)
library(stringr)
setDT(speech_rows)
speech_rows[, id := .I]
setkey(speech_rows, id)
#self join on subset by row
speech_rows[speech_rows, durs := {
temp <- df[!id == i.id & start.time < i.stop.time & stop.time > i.start.time & channel == "1_hands", ]
sum(temp$duration)
}, by = .EACHI]
This adds another columns 'durs' which is supposed to show the total duration of all the numeric values attached to a '1_hands' string in the overlaps column. Thus producing the following (first 6 columns removed to save space):
overlaps durs
1: 1_hands_CH1_1.145;2_body_CH1_1.883;2_body_N_14.272;2_body_N_4.825 0.000
2: 1_hands_N_1.417;2_body_CH1_1.075;2_body_N_5.485 1.417
3: 1_hands_CH1_3.557;1_hands_CH2_1.75;2_body_CH1_3.445;2_body_N_3.519 1.750
4: 2_body_CH1_3.308 0.000
5: 1_hands_CH1_4.896;1_hands_N_0.663;2_body_CH1_5.288 5.559
6: 2_body_CH1_2.494;2_body_N_6.571 0.000
But there is a problem, the sum() function does not capture all of the relevant strings. In row 1, there is the string: "1_hands_CH1_1.145", it is the only '1_hands' string in that row, so the value under durs for row 1 should be '1.145'. But the function ignores it for some reason. In row 2, the durs sum is correct. In row 3, it counts only one of the 1_hands values, and ignores the other. In row 5, it correctly finds both of the 1_hands values and adds them together. Rows 4 and 6 are have correct 'durs' values because there are no 1_hands observations in them.
This is very strange, and I don't know it correctly detects the numeric values at some times but not at others. This is problem #1
Problem #2: I cannot specify what I want beyond '1_hands', what I really want to do is get the sum of durations for all 1_hands_CH1 values, NOT all 1_hands values. To do this, I assume that you would just need to change the strings in 'channel == 1_hands'
temp <- df[!id == i.id & start.time < i.stop.time & stop.time > i.start.time & channel == **"1_hands"**, ]
But if I change it to something like "1_hands_CH1" all of the durs values will be zero, it can't anything past '1_hands'.
So in sum, I want to know why the math isn't working like I want it to, and why I can't select more specific strings.
CodePudding user response:
Here is one way you could get durations out of your overlaps column using the tidyverse. You can set text_string
equal to what you want durations for. I have provided some examples of how to enter your text string. The example below returns durations for all "1_hands" observations. If you wanted durations just for the "1_hands_CH1", then you would just set text_string <- "1_hands_CH1"
.
# Load tidyverse
library(tidyverse)
# Set text_string Equal To Specific String You Want Durations For
text_string <- "1_hands_[A-Z0-9] "
# Examples For text_string
# text_string <- "1_hands_CH1" ## example for getting 1_hands_CH1
# text_string <- ""2_body_N" ## example for getting 2_body_N
# text_string <- "1_hands_[A-Z0-9] " ## example for getting all 1_hands
# text_string <- "2_body_[A-Z0-9] " ## example for getting all 2_body
# df With Durations
df_with_durs <- df %>%
as_tibble() %>%
mutate(str_matches = str_match_all(overlaps, str_glue("{text_string}_[0-9.] ")),
durs = map(str_matches,
function(x) {
durs <- str_remove(x, str_glue("{text_string}_"))
num_durs <- as.numeric(durs)
sum_durs <- sum(num_durs)
return(sum_durs)
}
)
) %>%
unnest(cols = durs) %>%
select(-str_matches)
# View Output
df_with_durs
# channel start.time stop.time vp duration X id overlaps durs
# <chr> <dbl> <dbl> <chr> <dbl> <lgl> <int> <chr> <dbl>
# 4_speech 14.2 16.6 CH1 2.37 NA 165 1_hands_CH1_1.145;2_body_CH1_1.883;2_body_N_14.272;2_body_N_4.825 1.14
# 4_speech 21.2 22.3 CH1 1.06 NA 166 1_hands_N_1.417;2_body_CH1_1.075;2_body_N_5.485 1.42
# 4_speech 28.0 31.5 CH1 3.52 NA 167 1_hands_CH1_3.557;1_hands_CH2_1.75;2_body_CH1_3.445;2_body_N_3.519 5.31
# 4_speech 34.9 36.5 CH1 1.68 NA 168 2_body_CH1_3.308 0
# 4_speech 41.0 42.3 CH1 1.25 NA 169 1_hands_CH1_4.896;1_hands_N_0.663;2_body_CH1_5.288 5.56
# 4_speech 55.3 57.8 CH1 2.54 NA 170 2_body_CH1_2.494;2_body_N_6.571 0