Home > Software engineering >  Create third column that might be based on one of two columns
Create third column that might be based on one of two columns

Time:03-24

I have data like this: (many more columns not shown here)

    df<-structure(list(email = c("[email protected]", "[email protected]", 
"[email protected]", "[email protected]"), employee_number = c(123456, 
654321, 664422, 321458)), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"))

And I need to make a third column called "username". Username is usually just everything before the @ in their email UNLESS there's a period or a number in that name, then it would be their employee number.

In other words, I'm hoping to get results like this:

enter image description here

Any help would be appreciated!

CodePudding user response:

We could use str_detect on the substring of 'email' (before the @) to find for . or digits, then return the 'employee_number' or else remove the suffix part of 'email' with str_remove

library(dplyr)
library(stringr)
df <- df %>% 
   mutate(username = case_when(str_detect(trimws(email,
      whitespace = "@.*"), "[.0-9]")
     ~ as.character(employee_number), TRUE ~ str_remove(email, "@.*")))

-output

df
# A tibble: 4 × 3
  email               employee_number username 
  <chr>                         <dbl> <chr>    
1 [email protected]           123456 lbelcher 
2 [email protected]          654321 bbelchery
3 [email protected]            664422 664422   
4 [email protected]            321458 321458   
  • Related