Home > database >  Improving the readability of automated text generation based on a database query
Improving the readability of automated text generation based on a database query

Time:01-01

I am trying to improve the readability of automated text generation based on a database query.

is there a neat way to perform these substitutions ? To do the following in 1 command instead of 6?

x<-c("Te( )st", "Test()", "Test ()", "Test ( )", "Test ,,", "Test,, ", "Test , ")
out<-c("Test", "Test", "Test", "Test", "Test,", "Test, ", "Test,") 

x<-gsub(pattern = "( ", replacement = "(", x, fixed = T)
x<-gsub(pattern = " )", replacement = ")", x, fixed = T)
x<-gsub(pattern = " ,", replacement = ",", x, fixed = T)
x<-gsub(pattern = "()", replacement = "", x, fixed = T)
x<-gsub(pattern = ",,", replacement = ",", x, fixed = T)
x<-gsub(pattern = " ,", replacement = ",", x, fixed = T)

CodePudding user response:

You can use

x<-c("Te( )st", "Test()", "Test ()", "Test ( )", "Test ,,", "Test,, ", "Test , ")
gsub("\\(\\s*\\)|\\s (?=[,)])|(?<=\\()\\s |(,), ", "\\1", x, perl=TRUE)
# => [1] "Test"   "Test"   "Test "  "Test "  "Test,"  "Test, " "Test, "

See the R demo online and the regex demo. Details:

  • \(\s*\)| - (, zero or more whitespaces and then a ), or
  • \s (?=[,)])| - one or more whitespaces and then either , or ), or
  • (?<=\()\s | - one or more whitespaces immediately preceded with a ( char, or
  • (,), - a comma captured into Group 1 and then one or more commas.

The replacement is the Group 1 value, namely, if Group 1 matched, the replacement is a single comma, else, it is an empty string.

CodePudding user response:

You can use multigsub function which is a wrapper of gsub function in R. You can find the documentation here.

Here's the code:

multigsub(c("(", ")", ",", "()", ",,", " ,"), c("(", ")", ",", "", ",", ","), x, fixed = T)

CodePudding user response:

You can use mgsub::mgsub.

a = c("( ", " )", " ,", "()",",,") #pattern
b = c("(", ")", ",", "",",")       #replacement
x<-c("Te( )st", "Test()", "Test ()", "Test ( )", "Test ,,", "Test,, ", "Test , ")

mgsub::mgsub(x, a, b, fixed = T)
#[1] "Te()st"  "Test"    "Test "   "Test ()" "Test,,"  "Test, "  "Test, " 

You might want to add other patterns to get the output you want.

  • Related