I've been trying for a long time to find a way to use a relatively easy command to cut off characters from the beginning and end of text that are not alphabetic. However, it is important that there can be e.g. numeric characters within the text.
Let me give you an example:
a <- c("1) dog with 4 legs", "- cat with 1 tail", "2./ bird with 2 wings." )
b <- c("07 mouse with 1 tail.", "2.pig with 1 nose,,", "$ cow with 4 spots_")
data <- data.frame(cbind(a, b))
The proper outcome would be this:
a <- c("dog with 4 legs", "cat with 1 tail", "bird with 2 wings" )
b <- c("mouse with 1 tail", "pig with 1 nose", "cow with 4 spots")
data_cleaned <- data.frame(cbind(a, b))
Is there a simple solution?
CodePudding user response:
We could do something like this:
First we replace all special character with space. Then we remove everything before the first character:
library(dplyr)
library(stringr)
data %>%
mutate(across(c(a,b), ~str_replace_all(., "[[:punct:]]", " ")),
across(c(a,b), ~str_replace(., "^\\S* ", "")))
a b
1 dog with 4 legs mouse with 1 tail
2 cat with 1 tail pig with 1 nose
3 bird with 2 wings cow with 4 spots
CodePudding user response:
We may also use
data[] <- trimws(as.matrix(data), whitespace = "[[:punct:]0-9 ] ")
-output
> data
a b
1 dog with 4 legs mouse with 1 tail
2 cat with 1 tail pig with 1 nose
3 bird with 2 wings cow with 4 spots
CodePudding user response:
You could use trimws()
:
data[1:2] <- lapply(data[1:2], trimws, whitespace = "[^A-Za-z] ")
data
# a b
# 1 dog with 4 legs mouse with 1 tail
# 2 cat with 1 tail pig with 1 nose
# 3 bird with 2 wings cow with 4 spots
Its dplyr
equivalent is
library(dplyr)
data %>%
mutate(across(a:b, trimws, whitespace = "[^A-Za-z] "))