I would like to extract three letters of each string for each row in df
as below
Exampe:
df <- data.frame(name = c('Jame Bond', "Maria Taylor", "Micheal Balack"))
df
name
1 Jame Bond
2 Maria Taylor
3 Micheal Balack
desired out
df_new
name
1 Jam_Bon
2 Mar_Tay
3 Mic_Bal
Any sugesstions for this using tidyverse?
CodePudding user response:
library(stringr)
library(dplyr)
df$name %>%
str_extract_all("(?<=(^|[:space:]))[:alpha:]{3}") %>%
map_chr(~ str_c(.x, collapse = "_"))
The stringr
cheatsheet is very useful for working through these types of problems.
https://www.rstudio.com/resources/cheatsheets/
Created on 2022-03-26 by the reprex package (v2.0.1)
CodePudding user response:
You can try this with dplyr::rowwise()
, stringr::str_split()
and stringr::str_sub()
:
df_new <- df %>%
rowwise() %>%
mutate(name = paste(
unlist(
lapply(str_split(name, ' '), function(x){
str_sub(x, 1, 3)
})
),
collapse = "_"
))
I got the same result as you expected :
> df_new
# A tibble: 3 x 1
# Rowwise:
name
<chr>
1 Jam_Bon
2 Mar_Tay
3 Mic_Bal
CodePudding user response:
An alternative method using tidyr
functions:
df |>
extract(name, c("x1","x2"), "(\\w{3}).*\\s(\\w{3})") |>
unite(col = "name",x1,x2, sep = "_")
Giving:
name
1 Jam_Bon
2 Mar_Tay
3 Mic_Bal
Note that this assumes all first names and surnames have at least 3 characters, otherwise replace the extract regex with "(\\w{1,3}).*\\s(\\w{1,3})"