I have data like this: (many more columns not shown here)
df<-structure(list(email = c("[email protected]", "[email protected]",
"[email protected]", "[email protected]"), employee_number = c(123456,
654321, 664422, 321458)), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
And I need to make a third column called "username". Username is usually just everything before the @ in their email UNLESS there's a period or a number in that name, then it would be their employee number.
In other words, I'm hoping to get results like this:
Any help would be appreciated!
CodePudding user response:
We could use str_detect
on the substring of 'email' (before the @
) to find for .
or digits, then return the 'employee_number' or else remove the suffix part of 'email' with str_remove
library(dplyr)
library(stringr)
df <- df %>%
mutate(username = case_when(str_detect(trimws(email,
whitespace = "@.*"), "[.0-9]")
~ as.character(employee_number), TRUE ~ str_remove(email, "@.*")))
-output
df
# A tibble: 4 × 3
email employee_number username
<chr> <dbl> <chr>
1 [email protected] 123456 lbelcher
2 [email protected] 654321 bbelchery
3 [email protected] 664422 664422
4 [email protected] 321458 321458