The problem is I can only get either the first or last white space, while im trying to get both in one re.sub use.
Ive tried this regex which manages to get any white space after a number which is not really what i need here is the example
"(?<=\d)\s"
I cant use groups 1 and 4 because the amount of groups can change with other strings. The first white space will always be after the date which is always formatted the same, and the last will be before the cost of the thing, but the placement of the decimal or amount of numbers might change depending on cost.
Anyone have any thoughts?
CodePudding user response:
You could use capture groups here:
inp = "08/26 Card Purchase blah blah IL Card 0000 $14.00"
output = re.sub(r'^(\S*)\s (.*?)\s (\S*)$', r'\1|\2|\3', inp)
print(output) # 08/26|Card Purchase blah blah IL Card 0000|$14.00
This regex approach works by matching:
^
from the start of the string(\S*)
an optional non whitespace term in\1
\s
one or more whitespace characters(.*?)
capture in\2
the middle portion of the string\s
one or more whitespace characters at the end of the string(\S*)
capture final optional non whitespace term in\3
$
end of the string
Essentially the above is using a splicing trick to remove the first and last whitespace.