Regex pattern is taking more than 4 digit number-CodePudding

import re
text = """State of California that the foregoing is true and correct. (For California sheriff or marshal use only) 1950-24-12 I certify that the foregoing is true and correct. Date: (SIGNATURE) SUBP-010 [Rev. January 1,2012] PROOF OF SERVICE OF DEPOSITION SUBPOENA FOR PRODUCTION OF BUSINESS RECORDS 055826-00-07 Page 2 of 2"""
pattern = re.findall("\d{2,4}[-]\d{1,2}[-]\d{1,2}",text)
print(pattern)

Required_output: 1950-24-12

The solution is taking 5826-00-07. Though it has more than 4 digit number. Is there any solution to remove it

CodePudding user response：

What you want is called negative lookbehind. This means only matching a pattern when the section directly behind the match does not match a given sequence. To give you an example of what this means, (?<!something)abc will match any occurrence of "abc" that does not directly get proceeded by "something".

So in your case, you want to add (?<!\d) to the beginning of your regex to only match a pattern not proceeded by a digit.

Also, [-] will only match the character - so you don't need the brackets. After this change, the new regex is (?<!\d)\d{2,4}-\d{1,2}-\d{1,2}.