This is almost similar to this OR condition in Regex and many others close ...
I have an OCR Program that is reading labels off of pictures some of the bits cause some small errors with single characters in odd places but all the labels will have at least 2 letters and any wrong letters will be space padded at least trailing maybe leading
GIVEN :
- m Rose
- a a m a this test b c z ^ @
- k This Bigger k
- Great m z
- One Big Good Word This IS About AS LRG Possible and good one
DESIRED :
- Rose
- this test
- This Bigger
- Great
- One Big Good Word This IS About AS LRG Possible and good one
How do I get rid of the odd ball singles in c# I have been trying for hours with single and multiple Regex.Replace but am getting nowhere
str = Regex.Replace(str2, @"([0-9a-zA-Z]{1}) ([0-9a-zA-Z]{2,100})?","$2", RegexOptions.Multiline);
gets close but truncates a letter and space between words so "Open Hours" is "OpeHours" happy to replace with spaces then another line to get rid of them ..just not getting the words multiple words out since the lengths and occurrences are random and my regex skill is average at best, just seems there should be a one liner for this without having to split and reassemble.
...after regex for a reason.. I know could loop through the string and look for spaces before and after or other string voodoo ways ...
CodePudding user response:
try this .(?= )|(?<= ). |^. | .$
:
str = Regex.Replace(str2, @" .(?= )|(?<= ). |^. | .$","", RegexOptions.Multiline);
CodePudding user response:
You can use
text = Regex.Replace(text, @"(?:\b\w\b|[^\w\r\n]) ", " ")
See the regex demo.
Details:
(?:\b\w\b|[^\w\r\n])
- one or more sequences of\b\w\b
- a single word char word|
- or[^\w\r\n]
- any char other than a word char, or CR / LF.