Home > Back-end >  Put spaces around all punctuation but excluding apostrophes
Put spaces around all punctuation but excluding apostrophes

Time:03-20

I'm new to this, so I'm sorry if this is a stupid question... I need help with a bit of code in R...

I have a bit of code (below) which puts a space around all my punctuation in all txt files in a folder. It's lovely, but I don't want it to add space around apostrophes (') -

Can anybody help me exclude apostrophes in that bit gsub("(\. |[[:punct:]])", " \1 " ---? Or is that how you would do it? (with [^ ?)

I get this: "I want : spaces around all these marks ; : ! ? . but i didn ’ t want it there in didn ’ t"

I want this: "I want : spaces around all these marks ; : ! ? . but i didn’t want it there in didn’t"

for(file in filelist){
  tx=readLines(file)
  tx2=gsub("(\\. |[[:punct:]])", " \\1 ", tx)
  writeLines(tx2, con=file)
}

CodePudding user response:

We may match the ' and SKIP it before matching all other punctuation works

gsub("’(*SKIP)(*FAIL)|([[:punct:].])", " \\1 ", tx, perl = TRUE)

-output

[1] "I want : spaces around all these marks ;  :  !  ?  .  but i didn’t want it there in didn’t"

data

tx <- "I want:spaces around all these marks;:!?. but i didn’t want it there in didn’t"

CodePudding user response:

You can use

tx <- "I want: spaces around all these marks;:!?.but i didn’t want it there in didn't"
gsub("\\s*(\\. |[[:punct:]])(?<!\\b['’]\\b)\\s*", " \\1 ", tx, perl=TRUE)
## => [1] "I want : spaces around all these marks ;  :  !  ?  . but i didn’t want it there in didn't"

The perl=TRUE only means that the regex is handled with the PCRE library (note that PCRE regex engine is not the same as Perl regex engine). See the R demo online and the regex demo.

Details:

  • \s* - zero or more whitespaces
  • (\. |[[:punct:]]) - Group 1 (\1): one or more dots, or a punctuation char
  • (?<!\b['’]\b) - immediately on the left, there must be no ' or enclosed with word chars
  • \s* - zero or more whitespaces
  • Related