I want to match null or space as optional from the start of the line. The line is as follow:
Date Description Amount
null 12/05/2016 Asian Paints 2,150.65
13/05/2016 Nerolac GEB 5.86 22,512.65 Cr
14/05/2016 Hydra 12,412
The regex that I used is :
regex_null = re.compile(r"^(?:null)?\s (\d{2}/\d{2}/\d{4})\s (.*?)\s (\d[\d,]*\.\d{2}\s (?:Cr)?)$", re.M)
And what I got is:
null 12/05/2016 Asian Paints 2,150.65
13/05/2016 Nerolac GEB 5.86 22,512.65 Cr
So the null is not optional. It is currently considered compulsory. Can you please help me with this?
CodePudding user response:
You may use this regex with optional groups:
^\s*(?:null)?\s*(\d{2}/\d{2}/\d{4})\s (.*?)\s (\d[\d,]*(?:\.\d{2})?(\s Cr)?)$
RegEx Details:
^\s*(?:null)?\s*
: Match optionalnull
with 0 or more whitespaces on both sides(\d{2}/\d{2}/\d{4})
: Match date string in capture group #1\s
: Match 1 whitespaces(.*?)
: Math 0 or more characters in capture group #2\s
: Match 1 whitespaces(\d[\d,]*
: Match a digit followed by 0 or more digit/comma characters(?:\.\d{2})?
: Match optional dot and digits(\s Cr)?)
: Match optional 1 whitespaces followed byCr
$
: End
CodePudding user response:
You may apply a regex pattern in multiline mode which makes the first, sixth, and seventh values optional in the line.
inp = """ null 12/05/2016 Asian Paints 2,150.65
13/05/2016 Nerolac GEB 5.86 22,512.65 Cr
14/05/2016 Hydra 12,412"""
lines = re.findall(r'^\s*(null)?\s*(\d{1,2}/\d{1,2}/\d{4}) (\w (?: \w )*) (\d{1,3}(?:,\d{3})*(?:\.\d )?)?(?: (\d{1,3}(?:,\d{3})*(?:\.\d )?))?(?: (\w ))?', inp, flags=re.M)
print(lines)
This prints:
[('null', '12/05/2016', 'Asian Paints', '2,150.65', '', ''),
('', '13/05/2016', 'Nerolac GEB', '5.86', '22,512.65', 'Cr'),
('', '14/05/2016', 'Hydra', '12,412', '', '')]