Home > database >  How to extract different patterns in string in R?
How to extract different patterns in string in R?

Time:01-18

I want to extract a pattern of phrases from the following sentences.

text1 <- "On a year-on-year basis, the number of subscribers of Netflix increased 1.15% in November last year."

text2 <- "There is no confirmed audited number of subscribers in the Netflix's earnings report."

text3 <- "Netflix's unaudited number of subscribers has grown more than 1.50% at the last quarter."

The pattern is number of subscribers or audited number of subscribers or unaudited number of subscribers.

I am using the following pattern \\bnumber\\s of\\s subscribers?\\b from a previous problem (Thanks to @wiktor-stribiżew) and then extracting the phrases.

find_words <- function(text){
  
  pattern <- "\\bnumber\\s of\\s subscribers?\\b" # something like this

  str_extract(text, pattern)

}

However, this extracts the exact number of subscriber not the other patterns.

Desired output:

find_words(text1)

'number of subscribers'

find_words(text2)

'audited number of subscribers'

find_words(text3)

'unaudited number of subscribers'

CodePudding user response:

See if this works

find_words <- function(text){

pattern <- "(audited |unaudited )?number\\s of\\s subscribers"

str_extract(text, pattern)

}

You can test it with the sample texts you provided:

find_words(text1)
# 'number of subscribers'
find_words(text2)
# 'audited number of subscribers'
find_words(text3)
# 'unaudited number of subscribers'
  • Related