I have a dataframe like this.
name zip
Mike Tyson 13756
Mohammed Ali 54412
Joe Frazier 47463
Floyd Mayweahter 34134
I would like to use the first character of both the first name and last name, and the last two characters of the zip code to create a new variable
name zip new.var
Mike Tyson 13756 MT56
Mohammed Ali 54412 MA12
Joe Frazier 47463 JF63
Floyd Mayweahter 34134 FM34
I searched for a similar question, but here they use only one column Extract characters from a column and create new variable
CodePudding user response:
A base R way with regex -
transform(df, new.var = paste0(sub('^(.)\\w \\s(.).*', '\\1\\2', name),
sub('.*(..)$', '\\1', zip)))
# name zip new.var
#1 Mike Tyson 13756 MT56
#2 Mohammed Ali 54412 MA12
#3 Joe Frazier 47463 JF63
#4 Floyd Mayweahter 34134 FM34
The first sub
extract 1st character out of two words in name
column and the second sub
extract last two characters from zip
column. We combine them together in one column using paste0
.
data
It is easier to help if you provide data in a reproducible format
df <- structure(list(name = c("Mike Tyson", "Mohammed Ali", "Joe Frazier",
"Floyd Mayweahter"), zip = c(13756L, 54412L, 47463L, 34134L)),
class = "data.frame", row.names = c(NA, -4L))
CodePudding user response:
Using stringr
:
df %>%
mutate(new.var = paste(str_extract(name, "^."),
str_extract(sub(".* ","", name),"^."),
str_extract(zip,"..$"),
sep = ""))
Output:
name zip new.var
1 Mike Tyson 13756 MT56
2 Mohammed Ali 54412 MA12
3 Joe Frazier 47463 JF63
4 Floyd Mayweahter 34134 FM34
CodePudding user response:
Using base
R
df %>% mutate(mm = gsub('\\b(\\pL)\\pL{2,}|.','\\U\\1', name, perl = TRUE))%>%
mutate(vv = substring(zip, 4,6)) %>%
mutate(var = paste0(mm,vv)) %>%
select('name', 'zip', 'var')
name zip var
1 Mike Tyson 13756 MT56
2 Mohammed Ali 54412 MA12
3 Joe Frazier 47463 JF63
4 Floyd Mayweahter 34134 FM34
CodePudding user response:
We may do
library(dplyr)
library(stringr)
df %>%
mutate(new.var = str_c(str_remove_all(name, "[a-z ] "), substr(zip, 4,5)))
name zip new.var
1 Mike Tyson 13756 MT56
2 Mohammed Ali 54412 MA12
3 Joe Frazier 47463 JF63
4 Floyd Mayweahter 34134 FM34
data
df <- structure(list(name = c("Mike Tyson", "Mohammed Ali", "Joe Frazier",
"Floyd Mayweahter"), zip = c(13756L, 54412L, 47463L, 34134L)),
class = "data.frame", row.names = c(NA,
-4L))