I have a data set with open answers and I'm working with R. What I want to do is to summarize different answers with the same meaning that are sometimes spelled differently etc.
For example, there are these two open answers: "Anwalt", "Anwältin" and "Dozent/Anwalt". For each answers that involves the word stem "Anw", I want R to replace it with "Anwalt/Anwältin".
For "Anwalt" and "Anwältin", I tried this command:
offene_antworten$vb_wunsch <- str_replace_all(offene_antworten$vb_wunsch, c("(^Anw)" = "Anwalt/Anwältin"))
But it resolves in: Anwalt/Anwältinältin and I still have to solution for "Dozent/Anwalt". I tried variations of the str_replace_all function, regular expressions and read several blogs but I can't find a solution.
Help is very much appreciated!
CodePudding user response:
Are you trying to replace every answer that contains "Anw"
with "Anwalt/Anwältin"
, if so you can:
library(tidyverse)
Consider this sample
# A tibble: 10 x 2
question answer
<int> <chr>
1 1 Anwältin
2 2 Anwalt
3 3 Anwältin
4 4 Chocolate
5 5 Chocolate
6 6 Dozent/Anwalt
7 7 Chocolate
8 8 Dozent/Anwalt
9 9 Anwältin
10 10 Anwalt
df %>%
mutate(answer = case_when(str_detect(str_to_lower(answer),
"anw") ~ "Anwalt/Anwältin",
TRUE ~ answer))
# A tibble: 10 x 2
question answer
<int> <chr>
1 1 Anwalt/Anwältin
2 2 Anwalt/Anwältin
3 3 Anwalt/Anwältin
4 4 Chocolate
5 5 Chocolate
6 6 Anwalt/Anwältin
7 7 Chocolate
8 8 Anwalt/Anwältin
9 9 Anwalt/Anwältin
10 10 Anwalt/Anwältin
CodePudding user response:
# Considering upper or lower case
char <- c("Anwalt", "Anwältin", "Dozent/Anwalt", "anw", "wAn", "abcd")
char[grepl("Anw", char)] <- "Anwalt/Anwältin"
> char
[1] "Anwalt/Anwältin" "Anwalt/Anwältin" "Anwalt/Anwältin" "anw"
[5] "wAn" "abcd"
# Without considering upper or lower case
char2 <- char
char2[grepl("anw", tolower(char2))] <- "Anwalt/Anwältin"
> char2
[1] "Anwalt/Anwältin" "Anwalt/Anwältin" "Anwalt/Anwältin" "Anwalt/Anwältin"
[5] "wAn" "abcd"