I have a dataframe that looks like so:
ID | Tweet_ID | Tweet
1 12345 @sprintcare I did.
2 SPRINT @12345 Please send us a Private Message.
3 45678 @apple My information is incorrect.
4 APPLE @45678 What information is incorrect.
What I would like to do is some case_when statement to extract all the tweets that have the handle of the company name and ignore the numerical handles to create a new field.
Current code I'm playing around with but not succeeding with:
tweet_pattern <- " @[^0-9.-]\\w "
Customer <- Customer %>%
Response_To_Comp = ifelse(str_detect(Tweet, tweet_pattern),
str_extract(Tweet, tweet_pattern),
NA_character_))
Desired output:
ID | Tweet_ID | Tweet | Response_To_Comp
1 12345 @sprintcare I did. sprintcare
2 SPRINT @12345 Please send us a Private Message. NA
3 45678 @apple My information is incorrect. apple
4 APPLE @45678 What information is incorrect. NA
CodePudding user response:
You can use a lookbehind regex to extract the text which comes after '@'
and has one or more A-Za-z
characters in them.
library(dplyr)
library(stringr)
tweet_pattern <- "(?<=@)[A-Za-z] "
df %>%mutate(Response_To_Comp = str_extract(Tweet, tweet_pattern))
# ID Tweet_ID Tweet Response_To_Comp
#1 1 12345 @sprintcare I did. sprintcare
#2 2 SPRINT @12345 Please send us a Private Message. <NA>
#3 3 45678 @apple My information is incorrect. apple
#4 4 APPLE @45678 What information is incorrect. <NA>
CodePudding user response:
Using str_detect
and str_replace
library(stringr)
library(dplyr)
Customer %>%
mutate(Response_to_Comp = case_when(str_detect(Tweet, "@[^0-9-] ") ~
str_replace(Tweet, "@([A-Za-z] )\\s .*", "\\1")))
ID Tweet_ID Tweet Response_to_Comp
1 1 12345 @sprintcare I did. sprintcare
2 2 SPRINT @12345 Please send us a Private Message. <NA>
3 3 45678 @apple My information is incorrect. apple
4 4 APPLE @45678 What information is incorrect. <NA>
data
Customer <- structure(list(ID = 1:4, Tweet_ID = c("12345", "SPRINT", "45678",
"APPLE"), Tweet = c("@sprintcare I did.", "@12345 Please send us a Private Message.",
"@apple My information is incorrect.", "@45678 What information is incorrect."
)), class = "data.frame", row.names = c(NA, -4L))