Home > Software design >  Replace specific pattern (shortening notations) by full notation in R
Replace specific pattern (shortening notations) by full notation in R

Time:11-09

I have a data frame of short forms like

Ann-e/i is the short form for Anne and Anni

How can I replace the pattern -e/i in the data frame by the full notations? Another example is Matt-e/i for Matte and Matti.

Thanks in advance for any help!

CodePudding user response:

x <- c("Ann-e/i", "Matt-e/i")
gsub("(^[a-zA-Z] ?)-([a-z])/([a-z])$", "\\1\\2 and \\1\\3", x)
[1] "Anne and Anni"   "Matte and Matti"

CodePudding user response:

Wimpel's suggestion using gsub from base R works well and is quite flexible. Another approach is provided by the package stringr from the tidyverse, which might be more intuitive.

library(stringr)

strings <- c("Ann-e/i", "Annerl", "Matt-e/i")
str_replace(strings, "(\\w )-e/i", "\\1i or \\1e")
#> [1] "Anni or Anne"   "Annerl"         "Matti or Matte"

Created on 2021-11-08 by the reprex package (v2.0.1)

You'll find it helpful to learn about regular expressions (regex), if you're not already familiar with them. Since there are several varieties of regex with different syntax, here's a link that is specific to using it with stringr. https://stringr.tidyverse.org/articles/regular-expressions.html

CodePudding user response:

If you have comma-separated values you can do either of this depending on your desired outcome:

Data:

string <- c("Annerl,Ann-e/i", "Matt-e/i")

First solution:

sub("(^\\w )-(\\w)/(\\w)$", "\\1\\2 and \\1\\3", unlist(strsplit(string, ",")))
# [1] "Annerl"          "Anne and Anni"   "Matte and Matti"

Second:

c(sub("(^\\w ),(\\w )-(\\w)/(\\w)$|", "\\1, \\2\\3 and \\2\\4", string[grepl(",", string)]),
  sub("(^\\w )-(\\w)/(\\w)$", "\\1\\2 and \\1\\3", string[grep(",", string, invert = TRUE)]))
# [1] "Annerl, Anne and Anni" "Matte and Matti"
  •  Tags:  
  • r
  • Related