How to get first n characters from a string in R-CodePudding

I would like to extract three letters of each string for each row in df as below

Exampe:

df <- data.frame(name = c('Jame Bond', "Maria Taylor", "Micheal Balack"))
df
            name
1      Jame Bond
2   Maria Taylor
3 Micheal Balack

desired out

df_new 
        name
1      Jam_Bon
2      Mar_Tay
3      Mic_Bal

Any sugesstions for this using tidyverse?

CodePudding user response：

library(stringr)
library(dplyr)

df$name %>% 
  str_extract_all("(?<=(^|[:space:]))[:alpha:]{3}") %>% 
  map_chr(~ str_c(.x, collapse = "_"))

The stringr cheatsheet is very useful for working through these types of problems. https://www.rstudio.com/resources/cheatsheets/

^{Created on 2022-03-26 by the reprex package (v2.0.1)}

CodePudding user response：

You can try this with dplyr::rowwise(), stringr::str_split() and stringr::str_sub():

df_new <- df %>% 
  rowwise() %>% 
  mutate(name = paste(
    unlist(
      lapply(str_split(name, ' '), function(x){
        str_sub(x, 1, 3)
      })
    ), 
    collapse = "_"
  ))

I got the same result as you expected :

> df_new
# A tibble: 3 x 1
# Rowwise: 
  name   
  <chr>  
1 Jam_Bon
2 Mar_Tay
3 Mic_Bal

CodePudding user response：

An alternative method using tidyr functions:

df |> 
  extract(name, c("x1","x2"), "(\\w{3}).*\\s(\\w{3})") |> 
  unite(col = "name",x1,x2, sep = "_")

Giving:

     name
1 Jam_Bon
2 Mar_Tay
3 Mic_Bal

Note that this assumes all first names and surnames have at least 3 characters, otherwise replace the extract regex with "(\\w{1,3}).*\\s(\\w{1,3})"