Replacing strings with replace_all that have a specific beginning-CodePudding

I have a data set with open answers and I'm working with R. What I want to do is to summarize different answers with the same meaning that are sometimes spelled differently etc.

For example, there are these two open answers: "Anwalt", "Anwältin" and "Dozent/Anwalt". For each answers that involves the word stem "Anw", I want R to replace it with "Anwalt/Anwältin".

For "Anwalt" and "Anwältin", I tried this command:

offene_antworten$vb_wunsch <- str_replace_all(offene_antworten$vb_wunsch, c("(^Anw)" = "Anwalt/Anwältin"))

But it resolves in: Anwalt/Anwältinältin and I still have to solution for "Dozent/Anwalt". I tried variations of the str_replace_all function, regular expressions and read several blogs but I can't find a solution.

Help is very much appreciated!

CodePudding user response：

Are you trying to replace every answer that contains "Anw" with "Anwalt/Anwältin", if so you can:

library(tidyverse)

Consider this sample

# A tibble: 10 x 2
   question answer       
      <int> <chr>        
 1        1 Anwältin     
 2        2 Anwalt       
 3        3 Anwältin     
 4        4 Chocolate    
 5        5 Chocolate    
 6        6 Dozent/Anwalt
 7        7 Chocolate    
 8        8 Dozent/Anwalt
 9        9 Anwältin     
10       10 Anwalt       

df %>%  
  mutate(answer = case_when(str_detect(str_to_lower(answer), 
                                       "anw") ~ "Anwalt/Anwältin", 
                            TRUE ~ answer))

    # A tibble: 10 x 2
       question answer         
          <int> <chr>          
     1        1 Anwalt/Anwältin
     2        2 Anwalt/Anwältin
     3        3 Anwalt/Anwältin
     4        4 Chocolate      
     5        5 Chocolate      
     6        6 Anwalt/Anwältin
     7        7 Chocolate      
     8        8 Anwalt/Anwältin
     9        9 Anwalt/Anwältin
    10       10 Anwalt/Anwältin

CodePudding user response：

# Considering upper or lower case
char <- c("Anwalt", "Anwältin", "Dozent/Anwalt", "anw", "wAn", "abcd")
char[grepl("Anw", char)] <- "Anwalt/Anwältin"

> char
[1] "Anwalt/Anwältin" "Anwalt/Anwältin" "Anwalt/Anwältin" "anw"            
[5] "wAn"             "abcd"  


# Without considering upper or lower case
char2 <- char
char2[grepl("anw", tolower(char2))] <- "Anwalt/Anwältin"

> char2
[1] "Anwalt/Anwältin" "Anwalt/Anwältin" "Anwalt/Anwältin" "Anwalt/Anwältin"
[5] "wAn"             "abcd"