Home > Net >  How to extract characters following a pattern and remove the rest?
How to extract characters following a pattern and remove the rest?

Time:10-24

I'm trying to create a retweet network from raw tweet text I have. The text is formatted like this:

tweet_vector <- c("RT @person: tweet tweet tweet",
                  "RT @otherperson: tweet tweet",
                  "Tweet, this isn't a retweet, @3rdperson.",
                  "RT @4thperson: this retweet also has a mention, @mentioned")

I want to create a function that returns the following:

[1] "person"
[2] "otherperson"
[3] NA
[4] "4thperson"

I can't just use str_extract("\\@*", tweet_vector) because I don't want to catch @3rdperson

CodePudding user response:

str_extract(tweet_vector, "(?<=@)\\w (?=:)")
[1] "person"      "otherperson" NA            "4thperson"  


str_extract(tweet_vector, "(?<=RT @)\\w ")
[1] "person"      "otherperson" NA            "4thperson"  

sub(".*?@(\\w ):.*|.*", "\\1", tweet_vector)
[1] "person"      "otherperson" ""            "4thperson"  
  • Related