I have a data frame of short forms like
Ann-e/i is the short form for Anne and Anni
How can I replace the pattern -e/i in the data frame by the full notations? Another example is Matt-e/i for Matte and Matti.
Thanks in advance for any help!
CodePudding user response:
x <- c("Ann-e/i", "Matt-e/i")
gsub("(^[a-zA-Z] ?)-([a-z])/([a-z])$", "\\1\\2 and \\1\\3", x)
[1] "Anne and Anni" "Matte and Matti"
CodePudding user response:
Wimpel's suggestion using gsub
from base R works well and is quite flexible. Another approach is provided by the package stringr from the tidyverse, which might be more intuitive.
library(stringr)
strings <- c("Ann-e/i", "Annerl", "Matt-e/i")
str_replace(strings, "(\\w )-e/i", "\\1i or \\1e")
#> [1] "Anni or Anne" "Annerl" "Matti or Matte"
Created on 2021-11-08 by the reprex package (v2.0.1)
You'll find it helpful to learn about regular expressions (regex), if you're not already familiar with them. Since there are several varieties of regex with different syntax, here's a link that is specific to using it with stringr. https://stringr.tidyverse.org/articles/regular-expressions.html
CodePudding user response:
If you have comma-separated values you can do either of this depending on your desired outcome:
Data:
string <- c("Annerl,Ann-e/i", "Matt-e/i")
First solution:
sub("(^\\w )-(\\w)/(\\w)$", "\\1\\2 and \\1\\3", unlist(strsplit(string, ",")))
# [1] "Annerl" "Anne and Anni" "Matte and Matti"
Second:
c(sub("(^\\w ),(\\w )-(\\w)/(\\w)$|", "\\1, \\2\\3 and \\2\\4", string[grepl(",", string)]),
sub("(^\\w )-(\\w)/(\\w)$", "\\1\\2 and \\1\\3", string[grep(",", string, invert = TRUE)]))
# [1] "Annerl, Anne and Anni" "Matte and Matti"