Home > Software engineering >  Removing special characters from a sentence creates white spaces. How to fix?
Removing special characters from a sentence creates white spaces. How to fix?

Time:07-01

I have a sentence that has special characters or symbols that I need to delete. Here is the sentence:

text="Please review the entirety of this report to confirm that the detaiÒ‰ÏÎÇ◊ls of the report are of the requested patient. This information shall only be used for the purpose of providing medical or pharmaceutical treatment to a bona fide current patient. This information shall not be provided to any other person or entity except by order of a court of competent jurisdiction."

The aim is to delete these characters in "detaiÒ‰ÏÎÇ◊ls" and return "details". When I tried to run this regex expression: text.gsub!(/[^a-zA-Z0-9]/," ") it returned this:

] pry(#<Role>)> text.gsub!(/[^a-zA-Z0-9]/," ")
=> "Please review the entirety of this report to confirm that the detai      ls of the report are of the requested patient  This information shall only be used for the purpose of providing medical or pharmaceutical treatment to a bona fide current patient  This information shall not be provided to any other person or entity except by order of a court of competent jurisdiction "

When I tried to remove a space like this text.gsub!(/[^a-zA-Z0-9]/,"") it returned this and merged all words

PleasereviewtheentiretyofthisreporttoconfirmthatthedetailsofthereportareoftherequestedpatientThisinformationshallonlybeusedforthepurposeofprovidingmedicalorpharmaceuticaltreatmenttoabonafidecurrentpatientThisinformationshallnotbeprovidedtoanyotherpersonorentityexceptbyorderofacourtofcompetentjurisdiction

Does anyone has a better way of tackling this?

CodePudding user response:

Replace with an empty string as in your second attempt. But also include space in the list of characters that shouldn't be replaced, so it doesn't remove the spaces between words.

text.gsub!(/[^a-zA-Z0-9 ]/,"")

CodePudding user response:

you match every CHARACTER which is not excluded here: [^a-zA-Z0-9].

  1. define allowed word lengths like ([^a-zA-Z0-9]{1,}) and it will work.
  2. change the " " to ""

https://regex101.com/r/8e7crO/1

  • Related