I have a data frame speech_N_rows
that looks like this:
# channel start.time stop.time vp id overlaps
# 1: A 0.000 9.719 N 1 A:EE, C:N
# 2: A 9.719 11.735 N 2 A:N, D:other, A:N
# 3: C 0.264 2.032 N 3 A:N
# 4: B 26.514 28.264 N 4 B:CH1, D:N
# 5: D 82.316 82.702 N 5 C:CH2, B:CH2
# 6: D 10.354 11.666 N 6 A:EE, A:other
# 7: C 80.251 82.719 CH2 7 D:self, B:CH2
# 8: B 27.564 30.819 CH1 8 B:CH1, D:N
# 9: D 25.621 27.693 N 9 B:CH1, B:CH1
#10: A 10.354 11.666 other 10 A:EE, D:other
#11: B 80.251 82.719 CH2 11 D:self, C:CH2
#12: B 61.564 64.819 CH1 12 A:N
#13: A 60.621 62.693 N 13 B:CH1
In the overlaps
column, there are a series of strings, often, multiple strings in each cell separated by ','
I'm trying to get counts of specific strings, in this case "A:N". But I haven't figured out how to do that yet.
I can get the number of rows in which "A:N" occurs with by making vector of the 'overlaps' column and using the length function
testdata <- c(speech_N_rows$overlaps)
length(grep("A:N", testdata))
# [1] 3
However there are 4 total instances of "A:N", not 3. I can't figure out how to count multiple occurrences in the column, including multiple occurrences within a single row of the column (as is the case in row 2 of the 'overlaps' column).
Suggestions would be most appreciated.
CodePudding user response:
To count all the instances of A:N
you could use str_count
in the stringr
library in combination with sum()
:
sum(stringr::str_count(df$overlaps, "A:N"))
# [1] 4
The stringr::str_count()
counts the number of the designated pattern in each element:
stringr::str_count(df$overlaps, "A:N")
# [1] 0 2 1 0 0 0 0 0 0 0 0 1 0
While sum()
adds them all up to produce the overall number of instances.
Data
df <- read.table( text = "channel start.time stop.time vp id overlaps
A 0.000 9.719 N 1 A:EE,C:N
A 9.719 11.735 N 2 A:N,D:other,A:N
C 0.264 2.032 N 3 A:N
B 26.514 28.264 N 4 B:CH1,D:N
D 82.316 82.702 N 5 C:CH2,B:CH2
D 10.354 11.666 N 6 A:EE,A:other
C 80.251 82.719 CH2 7 D:self,B:CH2
B 27.564 30.819 CH1 8 B:CH1,D:N
D 25.621 27.693 N 9 B:CH1,B:CH1
A 10.354 11.666 other 10 A:EE,D:other
B 80.251 82.719 CH2 11 D:self,C:CH2
B 61.564 64.819 CH1 12 A:N
A 60.621 62.693 N 13 B:CH1", header = TRUE)