Hi everyone. I'm so sorry for my english. I need to separate the domain data of some emails in a table. Then, if these mail data have the domain of a country, this information must be moved to another column that is incomplete in which the participants of a congress are included. This for a relatively large database. I put an example below.
| email | country | | -------- | -------------- | | naco@gmail.com | CO | | monic45814@gmail.com | AR | | jsalazar@chapingo.mx | | | andresramirez@urosario.edu.co | | | jeimy861491@hotmail.com | CL | |jytvc@hotmail.com | | Outcome should be | email | country | | -------- | -------------- | | naco@gmail.com | CO | | monic45814@gmail.com | AR | | jsalazar@chapingo.mx | MX | | andresramirez@urosario.edu.co | CO | |jeimy861491@hotmail.com | CL | |jytvc@hotmail.com | *NA* |
Thank you so much.
CodePudding user response:
You can use str_extract
to get the string after the last occurrence of "." and if_else
to ignore rows that already have a country and rows which e-mail doesn't end with a country code:
df %>%
mutate(country = if_else(is.na(country) & str_extract(email, "[^.] $") != "com", toupper(str_extract(email, "[^.] $")), country))
small but not so small PS: I would always recommend to provide fake data when you are mentioning personal data like e-mail addresses
CodePudding user response:
Here is a solution in base R.
Suppose:
df<-data.frame(email,country)
Then:
df$country<-ifelse(is.na(df$country)&sub(".*(.*?)[\\.|:]", "",df$email)!="com",sub(".*(.*?)[\\.|:]", "",df$email),paste(df$country))