Home > database >  Extract First and Last character vectors from a column that is a list: R
Extract First and Last character vectors from a column that is a list: R

Time:01-02

I'm having difficulties implementing a solution for this question provided by users on many similar questions like this. See sample df below.

structure(list(FirstName = c("Albus Percival Wulfric Brian Dumbledore", 
"Harry James Potter", "Tom Marvollo Riddle", "Lord Voldemort"
), Email = c("[email protected]", "[email protected]", "[email protected]", 
"[email protected]"), ClassSection = c("HeadMaster", "Student", "Dark Lord in training", 
"Dark Lord")), row.names = c(NA, -4L), spec = structure(list(
    cols = list(FirstName = structure(list(), class = c("collector_character", 
    "collector")), Email = structure(list(), class = c("collector_character", 
    "collector")), ClassSection = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

I want to create a new column, where the first and last names are united. For this, I first tried separate(FirstName, sep = " ", into("First", "Middle", Last"). However, what happens is that there are other word elements that get missed. So, I'm not able to effectively combine them together.

Next, I tried, df%>% mutate(First = str_split(FirstName, pattern = " ")). This gives a list of elements. I want a way to extract the first and the last element from this column.

# A tibble: 4 x 4
  FirstName                               Email               ClassSection          First    
  <chr>                                   <chr>               <chr>                 <list>   
1 Albus Percival Wulfric Brian Dumbledore [email protected] HeadMaster            <chr [4]>
2 Harry James Potter                      [email protected] Student               <chr [3]>
3 Tom Marvollo Riddle                     [email protected]   Dark Lord in training <chr [3]>
4 Lord Voldemort                          [email protected]          Dark Lord             <chr [2]>

I looked at various answers where tail(First, n=1) and dplyr's last(First) was suggested. However, these don't give me the right answer. I also tried unnest_wider(First) but it has the same problem as separate(firstName). That is, I see multiple columns. Now these don't work for names that are just two or more than 3 words.

I'm looking to continue the dplyr (tidyverse's) workflow. Is there a way I can get the first and last vector to combine together into a new column?

CodePudding user response:

Do you mean something like this?

df %>%
  mutate(
    FirstLast = sapply(str_split(FirstName, pattern = " "),
                       \(z) paste(z[unique(c(1, length(z)))], collapse = ""))
  )
# # A tibble: 4 × 4
#   FirstName                               Email               ClassSection          FirstLast      
#   <chr>                                   <chr>               <chr>                 <chr>          
# 1 Albus Percival Wulfric Brian Dumbledore [email protected] HeadMaster            AlbusDumbledore
# 2 Harry James Potter                      [email protected] Student               HarryPotter    
# 3 Tom Marvollo Riddle                     [email protected]   Dark Lord in training TomRiddle      
# 4 Lord Voldemort                          [email protected]          Dark Lord             LordVoldemort  

or much more simply

df %>%
  mutate(FirstLast = sub(" .* ", "", FirstName))
# # A tibble: 4 × 4
#   FirstName                               Email               ClassSection          FirstLast      
#   <chr>                                   <chr>               <chr>                 <chr>          
# 1 Albus Percival Wulfric Brian Dumbledore [email protected] HeadMaster            AlbusDumbledore
# 2 Harry James Potter                      [email protected] Student               HarryPotter    
# 3 Tom Marvollo Riddle                     [email protected]   Dark Lord in training TomRiddle      
# 4 Lord Voldemort                          [email protected]          Dark Lord             Lord Voldemort 

CodePudding user response:

We may use extract

library(tidyr)
extract(df, FirstName, into = c("First", "Last"),
    "^(\\S )\\s*.*\\s (\\S )$", remove = FALSE)

-output

# A tibble: 4 × 5
  FirstName                               First Last       Email               ClassSection         
  <chr>                                   <chr> <chr>      <chr>               <chr>                
1 Albus Percival Wulfric Brian Dumbledore Albus Dumbledore [email protected] HeadMaster           
2 Harry James Potter                      Harry Potter     [email protected] Student              
3 Tom Marvollo Riddle                     Tom   Riddle     [email protected]   Dark Lord in training
4 Lord Voldemort                          Lord  Voldemort  [email protected]          Dark Lord            

Or to extract from the list

library(purrr)
library(dplyr)
df%>%
   mutate(First = str_split(FirstName, pattern = " "), .after = FirstName) %>% 
   mutate(First = map(First, ~ tibble(First = first(.x), 
       Last = last(.x)))) %>% 
   unnest_wider(First)

-output

# A tibble: 4 × 5
  FirstName                               First Last       Email               ClassSection         
  <chr>                                   <chr> <chr>      <chr>               <chr>                
1 Albus Percival Wulfric Brian Dumbledore Albus Dumbledore [email protected] HeadMaster           
2 Harry James Potter                      Harry Potter     [email protected] Student              
3 Tom Marvollo Riddle                     Tom   Riddle     [email protected]   Dark Lord in training
4 Lord Voldemort                          Lord  Voldemort  [email protected]          Dark Lord            
  • Related