Home > Mobile >  How to replace empty spaces with values from adjacent colum that needs to be separated?
How to replace empty spaces with values from adjacent colum that needs to be separated?

Time:11-26

Hi everyone. I'm so sorry for my english. I need to separate the domain data of some emails in a table. Then, if these mail data have the domain of a country, this information must be moved to another column that is incomplete in which the participants of a congress are included. This for a relatively large database. I put an example below.

| email                         | country        | 
| --------                      | -------------- | 
| naco@gmail.com                | CO             |
| monic45814@gmail.com          | AR             |
| jsalazar@chapingo.mx          |                | 
| andresramirez@urosario.edu.co |                |
| jeimy861491@hotmail.com       | CL             |
|jytvc@hotmail.com              |                |

Outcome should be

| email                         | country        | 
| --------                      | -------------- | 
| naco@gmail.com                | CO             |
| monic45814@gmail.com          | AR             |
| jsalazar@chapingo.mx          | MX             | 
| andresramirez@urosario.edu.co | CO             |
|jeimy861491@hotmail.com        | CL             |
|jytvc@hotmail.com              | *NA*           |

Thank you so much.

CodePudding user response:

You can use str_extract to get the string after the last occurrence of "." and if_else to ignore rows that already have a country and rows which e-mail doesn't end with a country code:

df %>%
  mutate(country = if_else(is.na(country) & str_extract(email, "[^.] $") != "com", toupper(str_extract(email, "[^.] $")), country))

small but not so small PS: I would always recommend to provide fake data when you are mentioning personal data like e-mail addresses

CodePudding user response:

Here is a solution in base R.

Suppose:

df<-data.frame(email,country)

Then:

df$country<-ifelse(is.na(df$country)&sub(".*(.*?)[\\.|:]", "",df$email)!="com",sub(".*(.*?)[\\.|:]", "",df$email),paste(df$country))
  • Related