I have the results from a survey, in which a bunch of anwsers have errors, such as misspellings, UppercAseS/lower cases, ...
Therefore, I need something like a find and replace kind of solution (I've found some possible functions but none of them seemed to work. I am kind of a no0b)
...but instead of finding and replacing one by one, I would like to create a vector (?) of "mistakes" and then replace them with the correct answer, tidying my text for later being able to visualize the results.
I tried this
Consider VAR1 as the awnsers:
VAR1 <- c("motorbyke","motor bike","Mbike","Motor B","Motor","Bike")
And I would like to have a change the misspelled awnsers to a correct one; let's say "motorbike"...
DB %>%
mutate(VAR1 = replace(VAR1, VAR1 == "misspelling", "correct answer"))
but there are too many errors for doing it individually...
Is there any solution for my dilema?
Thank you
EDIT: tried do add an example
CodePudding user response:
Here's one possible solution using the tidyverse and left_join
s:
DB <- data.frame(
VAR1=c(c("motorbyke","motor bike","Mbike","Motor B","Motor","Bike"),
sample(stringr::words, 10)))
correction_df <- data.frame(
cbind(correction="motorbike", incorrect=c("motorbyke","motor bike","Mbike","Motor B","Motor","Bike"))
)
DB %>%
left_join(correction_df, by=c(VAR1="incorrect")) %>%
mutate(VAR1=ifelse(is.na(correction), VAR1, correction)) %>%
select(-correction)
where new entries can be added to correction_df
with the syntax provided. Alternatively, the fuzzyjoin
package does something very similar and might automate some of the corrections you're interested in.
CodePudding user response:
You could create a pattern for str_replace
of your vector and then replace all of these with motorbike
(in column or vector etc....)
VAR1 <- c("motorbyke","motor bike","Mbike","Motor B","Motor","Bike")
my_pattern <- paste(VAR1, collapse = "|")
library(stringr)
str_replace(VAR1, my_pattern, 'motorbike')
output:
[1] "motorbike" "motorbike" "motorbike" "motorbike" "motorbike" "motorbike"