Home > Software engineering >  Combine characters from two columns and create a new column
Combine characters from two columns and create a new column

Time:10-06

I have a dataframe like this.

name                zip   
Mike Tyson          13756
Mohammed Ali        54412
Joe Frazier         47463
Floyd Mayweahter    34134

I would like to use the first character of both the first name and last name, and the last two characters of the zip code to create a new variable

name                zip    new.var
Mike Tyson          13756  MT56
Mohammed Ali        54412  MA12
Joe Frazier         47463  JF63
Floyd Mayweahter    34134  FM34 

I searched for a similar question, but here they use only one column Extract characters from a column and create new variable

CodePudding user response:

A base R way with regex -

transform(df, new.var = paste0(sub('^(.)\\w \\s(.).*', '\\1\\2', name), 
                               sub('.*(..)$', '\\1', zip)))

#              name   zip new.var
#1       Mike Tyson 13756    MT56
#2     Mohammed Ali 54412    MA12
#3      Joe Frazier 47463    JF63
#4 Floyd Mayweahter 34134    FM34

The first sub extract 1st character out of two words in name column and the second sub extract last two characters from zip column. We combine them together in one column using paste0.

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(name = c("Mike Tyson", "Mohammed Ali", "Joe Frazier", 
"Floyd Mayweahter"), zip = c(13756L, 54412L, 47463L, 34134L)), 
class = "data.frame", row.names = c(NA, -4L))

CodePudding user response:

Using stringr:

df %>% 
  mutate(new.var = paste(str_extract(name, "^."),
                         str_extract(sub(".* ","", name),"^."),
                         str_extract(zip,"..$"),
                         sep = ""))

Output:

              name   zip new.var
1       Mike Tyson 13756    MT56
2     Mohammed Ali 54412    MA12
3      Joe Frazier 47463    JF63
4 Floyd Mayweahter 34134    FM34

CodePudding user response:

Using base R

df %>% mutate(mm =  gsub('\\b(\\pL)\\pL{2,}|.','\\U\\1', name, perl = TRUE))%>% 
      mutate(vv = substring(zip, 4,6)) %>% 
     mutate(var = paste0(mm,vv)) %>% 
  select('name', 'zip', 'var')

              name   zip  var
1       Mike Tyson 13756 MT56
2     Mohammed Ali 54412 MA12
3      Joe Frazier 47463 JF63
4 Floyd Mayweahter 34134 FM34

CodePudding user response:

We may do

library(dplyr)
library(stringr)
df %>% 
    mutate(new.var = str_c(str_remove_all(name, "[a-z ] "), substr(zip, 4,5)))
              name   zip new.var
1       Mike Tyson 13756    MT56
2     Mohammed Ali 54412    MA12
3      Joe Frazier 47463    JF63
4 Floyd Mayweahter 34134    FM34

data

df <- structure(list(name = c("Mike Tyson", "Mohammed Ali", "Joe Frazier", 
"Floyd Mayweahter"), zip = c(13756L, 54412L, 47463L, 34134L)), 
class = "data.frame", row.names = c(NA, 
-4L))
  •  Tags:  
  • r
  • Related