Home > OS >  Clear contaminated vector of Strings in R
Clear contaminated vector of Strings in R

Time:01-03

I have a list of strings in R contaminated with some undesirable characters "X." and ".", like this:

"age", ".name", "X.marks", "X.study.time", "class", "X.number"

And I want to parse the string data to:

"age", "name", "marks", "study time", "class", "number"

Meaning, I want to remove "X." if it exists and substitute every "." for " " (space). How can I do this in R?

CodePudding user response:

We may use sub

gsub(".", " ", sub("^X?\\.", "", v1), fixed = TRUE)
[1] "age"        "name"       "marks"      "study time" "class"      "number"   

data

v1 <- c("age", ".name", "X.marks", "X.study.time", "class", "X.number")

CodePudding user response:

You can do the desired substitution with the str_replace_all function from the stringr package. Using the v1 object posted by akrun:

library(stringr)
# Replace all "X." by nothing and all "." not preceded by "X" by spaces
str_replace_all(v1, c("X\\." = "", "(?<!X)\\." = " "))

# "age"        " name"      "marks"      "study time" "class"      "number" 

CodePudding user response:

Here is another stringr solution combining two functions:

library(stringr)
str_trim(str_replace_all(v1, "\\.|X", " "))
[1] "age"        "name"       "marks"      "study time" "class"      "number" 
  • Related