I have a dataset where names have been entered differently. Some names of been entered as first name space last name while others have been entered last name comma first name. I need all to read last name comma first name. I would like to keep the data within the dataframe but I can append back if there is no other way to do it. Here is an example of the dataframe:
Names | Other_Column |
---|---|
Smith, John | ... |
Sam Miller | ... |
Anderson, Sam | ... |
Williams, Jacob | ... |
Susan Styles | ... |
Burke, David | ... |
I have tried to do a case_when statement after piping in the dataframe but that didn't work. I have also tried grep1 and str_split.
CodePudding user response:
library(dplyr)
quux %>%
mutate(
Names = if_else(grepl(",", Names),
Names,
sub("^(. )\\s (\\S )$", "\\2, \\1", Names))
)
# Names Other_Column
# 1 Smith, John ...
# 2 Miller, Sam ...
# 3 Anderson, Sam ...
# 4 Williams, Jacob ...
# 5 Styles, Susan ...
# 6 Burke, David ...
Regex:
^(. )\\s (\\S )$
^ beginning-of-string
(^^) group of anything (1-or-more)
^^^^ blank-space (1-or-more)
(^^^^) group of non-blank-space characters (1-or-more)
^ end-of-string
If there is a comma, nothing is changed. If there is no comma, this takes the last "word" (blank-delimited) and moves it to the front with a comma.
Data
quux <- structure(list(Names = c("Smith, John", "Sam Miller", "Anderson, Sam", "Williams, Jacob", "Susan Styles", "Burke, David"), Other_Column = c("...", "...", "...", "...", "...", "...")), class = "data.frame", row.names = c(NA, -6L))
CodePudding user response:
You can also do the following.
library(tidyverse)
df %>%
separate(Names, into = c("first", "second"), remove = F) %>%
transmute(Names = Names,
new_names = case_when(str_detect(Names, ",") ~ Names,
T ~ str_c(second, first, sep = ", ")))
# A tibble: 6 × 2
# Names new_names
# <chr> <chr>
# 1 Smith, John Smith, John
# 2 Sam Miller Miller, Sam
# 3 Anderson, Sam Anderson, Sam
# 4 Williams, Jacob Williams, Jacob
# 5 Susan Styles Styles, Susan
# 6 Burke, David Burke, David
Data
df <- tibble(Names = c("Smith, John", "Sam Miller", "Anderson, Sam", "Williams, Jacob", "Susan Styles", "Burke, David"))
CodePudding user response:
This might help you:
df <-
tibble::tribble(
~Names, ~Other_Column,
"Smith, John", "...",
"Sam Miller", "...",
"Anderson, Sam", "...",
"Williams, Jacob", "...",
"Susan Styles", "...",
"Burke, David", "..."
)
library(stringr)
library(dplyr)
change_name <-
function(x){
if(!str_detect(x,",")){
aux <- str_split(x,pattern = " ")[[1]]
output <- str_c(aux[2],", ",aux[1])
}else{
output <- x
}
return(output)
}
df %>%
rowwise() %>%
mutate(new_name = change_name(Names))
# A tibble: 6 x 3
# Rowwise:
Names Other_Column new_name
<chr> <chr> <chr>
1 Smith, John ... Smith, John
2 Sam Miller ... Miller, Sam
3 Anderson, Sam ... Anderson, Sam
4 Williams, Jacob ... Williams, Jacob
5 Susan Styles ... Styles, Susan
6 Burke, David ... Burke, David