Home > OS >  Split names with different options based on characters
Split names with different options based on characters

Time:11-29

I have a dataset where names have been entered differently. Some names of been entered as first name space last name while others have been entered last name comma first name. I need all to read last name comma first name. I would like to keep the data within the dataframe but I can append back if there is no other way to do it. Here is an example of the dataframe:

Names Other_Column
Smith, John ...
Sam Miller ...
Anderson, Sam ...
Williams, Jacob ...
Susan Styles ...
Burke, David ...

I have tried to do a case_when statement after piping in the dataframe but that didn't work. I have also tried grep1 and str_split.

CodePudding user response:

library(dplyr)
quux %>%
  mutate(
    Names = if_else(grepl(",", Names),
                    Names,
                    sub("^(. )\\s (\\S )$", "\\2, \\1", Names))
  )
#             Names Other_Column
# 1     Smith, John          ...
# 2     Miller, Sam          ...
# 3   Anderson, Sam          ...
# 4 Williams, Jacob          ...
# 5   Styles, Susan          ...
# 6    Burke, David          ...

Regex:

^(. )\\s (\\S )$
^                 beginning-of-string
 (^^)             group of anything (1-or-more)
     ^^^^         blank-space (1-or-more)
         (^^^^)   group of non-blank-space characters (1-or-more)
               ^  end-of-string

If there is a comma, nothing is changed. If there is no comma, this takes the last "word" (blank-delimited) and moves it to the front with a comma.


Data

quux <- structure(list(Names = c("Smith, John", "Sam Miller", "Anderson, Sam", "Williams, Jacob", "Susan Styles", "Burke, David"), Other_Column = c("...", "...", "...", "...", "...", "...")), class = "data.frame", row.names = c(NA, -6L))

CodePudding user response:

You can also do the following.

library(tidyverse)

df %>%
  separate(Names, into = c("first", "second"), remove = F) %>%
  transmute(Names = Names,
            new_names = case_when(str_detect(Names, ",") ~ Names,
                                  T ~ str_c(second, first, sep = ", ")))

# A tibble: 6 × 2
#   Names           new_names      
#   <chr>           <chr>          
# 1 Smith, John     Smith, John    
# 2 Sam Miller      Miller, Sam    
# 3 Anderson, Sam   Anderson, Sam  
# 4 Williams, Jacob Williams, Jacob
# 5 Susan Styles    Styles, Susan  
# 6 Burke, David    Burke, David

Data

df <- tibble(Names = c("Smith, John", "Sam Miller", "Anderson, Sam", "Williams, Jacob", "Susan Styles", "Burke, David"))

CodePudding user response:

This might help you:

df <-
tibble::tribble(
             ~Names, ~Other_Column,
      "Smith, John",         "...",
       "Sam Miller",         "...",
    "Anderson, Sam",         "...",
  "Williams, Jacob",         "...",
     "Susan Styles",         "...",
     "Burke, David",         "..."
  )

library(stringr)
library(dplyr)

change_name <- 
  function(x){
    if(!str_detect(x,",")){
      aux <- str_split(x,pattern = " ")[[1]]
      output <- str_c(aux[2],", ",aux[1])
    }else{
      output <- x
    }
    return(output)
  }

df %>% 
  rowwise() %>% 
  mutate(new_name = change_name(Names))

# A tibble: 6 x 3
# Rowwise: 
  Names           Other_Column new_name       
  <chr>           <chr>        <chr>          
1 Smith, John     ...          Smith, John    
2 Sam Miller      ...          Miller, Sam    
3 Anderson, Sam   ...          Anderson, Sam  
4 Williams, Jacob ...          Williams, Jacob
5 Susan Styles    ...          Styles, Susan  
6 Burke, David    ...          Burke, David 
  • Related