Home > Back-end >  Can't use loop on str_count() function
Can't use loop on str_count() function

Time:08-30

I have a Data Frame that has two columns like that:

USER ID text
1 "..."
2 "..."
. .
. .
. .
100 "..."

Let's say there are 100 users and each user has a text.

I want to count the proportion the texts that has question marks in them: for example, let's say I have only 20 texts in which there are question marks. That means the value I will get is 20/100 (I don't care how many questions marks are within each text).

I tried to use str_count() and build a loop for it:

for (i in 1:length(data_frame$text)) {
str_count(data_frame$text[i], pattern = "\\?")}

but it just not working, it's not even producing an error

CodePudding user response:

If you want to find if there is a question mark in the string (dichotomize as 1/0) you could do this in base R:

df <- data.frame(id = 1:10,
                 text = c(LETTERS[1:5], paste0(LETTERS[1:5],"?")))

df$question_mark <- grepl("\\?", df$text)*1

You can find the proportion by:

sum(df$question_mark) / nrow(df)

CodePudding user response:

You may want to use stringr::str_detect() and you do not need a for loop. Most of the str_* functions are vectorized, which is one of R's core strengths. (It still is a hidden for loop of course but it is implemented in c and so it's much faster as well as easier to write).

Consider:

df$test <- c("asa", "asa?", "asa??", "asa???", "asa??")
result <- paste0(  sum(stringr::str_detect(df$test, "\\?")), "/", length(df$test)  )

print(result)

4/5
  • Related