I'm new to this, so I'm sorry if this is a stupid question... I need help with a bit of code in R...
I have a bit of code (below) which puts a space around all my punctuation in all txt files in a folder. It's lovely, but I don't want it to add space around apostrophes (') -
Can anybody help me exclude apostrophes in that bit gsub("(\. |[[:punct:]])", " \1 " ---? Or is that how you would do it? (with [^ ?)
I get this: "I want : spaces around all these marks ; : ! ? . but i didn ’ t want it there in didn ’ t"
I want this: "I want : spaces around all these marks ; : ! ? . but i didn’t want it there in didn’t"
for(file in filelist){
tx=readLines(file)
tx2=gsub("(\\. |[[:punct:]])", " \\1 ", tx)
writeLines(tx2, con=file)
}
CodePudding user response:
We may match the '
and SKIP
it before matching all other punctuation works
gsub("’(*SKIP)(*FAIL)|([[:punct:].])", " \\1 ", tx, perl = TRUE)
-output
[1] "I want : spaces around all these marks ; : ! ? . but i didn’t want it there in didn’t"
data
tx <- "I want:spaces around all these marks;:!?. but i didn’t want it there in didn’t"
CodePudding user response:
You can use
tx <- "I want: spaces around all these marks;:!?.but i didn’t want it there in didn't"
gsub("\\s*(\\. |[[:punct:]])(?<!\\b['’]\\b)\\s*", " \\1 ", tx, perl=TRUE)
## => [1] "I want : spaces around all these marks ; : ! ? . but i didn’t want it there in didn't"
The perl=TRUE
only means that the regex is handled with the PCRE library (note that PCRE regex engine is not the same as Perl regex engine).
See the R demo online and the regex demo.
Details:
\s*
- zero or more whitespaces(\. |[[:punct:]])
- Group 1 (\1
): one or more dots, or a punctuation char(?<!\b['’]\b)
- immediately on the left, there must be no'
or’
enclosed with word chars\s*
- zero or more whitespaces