Can't use loop on str_count() function-CodePudding

I have a Data Frame that has two columns like that:

USER ID	text
1	"..."
2	"..."
.	.
.	.
.	.
100	"..."

Let's say there are 100 users and each user has a text.

I want to count the proportion the texts that has question marks in them: for example, let's say I have only 20 texts in which there are question marks. That means the value I will get is 20/100 (I don't care how many questions marks are within each text).

I tried to use str_count() and build a loop for it:

for (i in 1:length(data_frame$text)) {
str_count(data_frame$text[i], pattern = "\\?")}

but it just not working, it's not even producing an error

CodePudding user response：

If you want to find if there is a question mark in the string (dichotomize as 1/0) you could do this in base R:

df <- data.frame(id = 1:10,
                 text = c(LETTERS[1:5], paste0(LETTERS[1:5],"?")))

df$question_mark <- grepl("\\?", df$text)*1

You can find the proportion by:

sum(df$question_mark) / nrow(df)

CodePudding user response：

You may want to use stringr::str_detect() and you do not need a for loop. Most of the str_* functions are vectorized, which is one of R's core strengths. (It still is a hidden for loop of course but it is implemented in c and so it's much faster as well as easier to write).

Consider:

df$test <- c("asa", "asa?", "asa??", "asa???", "asa??")
result <- paste0(  sum(stringr::str_detect(df$test, "\\?")), "/", length(df$test)  )

print(result)

4/5