Home > database >  Regex Pattern Should Not Match More Than One Sentence
Regex Pattern Should Not Match More Than One Sentence

Time:08-26

The code the (?!.*(yummyy|tummyy)).{0,100} (has|have) (?!.*(yummyy|tummyy)).{0,500} compliance (?!.*(yummyy|tummyy)).{0,500} future matches the string below:

the district has a number of capital appreciation bonds outstanding that were issued at deep discounts. these discounts are being accreted over the life of the bonds. in the year ended june 30 2017 $1317869 was accreted and the accumulated accreted balance is presented within the long-term liabilities. there are a number of limitations and restrictions contained in the general obligation bond indenture. management has indicated that the district is in compliance with all significant limitations and restrictions at june 30 2017. h. commitments under leases capital leases the district has entered into three lease agreements for the purchase of technology equipment and buses. the lease terms range from 32 to 60 months with interest rates ranging from 1.499% to 3.587%. as of june 30 2017 the future

How can we stop regex from essentially going through multiple lines or sentences? We want the regex pattern to match with a string like the ones below:

the district has undertaken measures to ensure full compliance in all future filings.

the board and people have undertaken procedures and appointed Mr. K to ensure consistent compliance with respect to prior undertakings in future haha.

We do not want the regex pattern to match with the string below:

the county has been in full compliance lol. will ensure in future

Thanks so much!

CodePudding user response:

For the pattern that you are using, you can add a desired punctuation followed by a space that should not be crossed, but if you want to match Mr. you can assert that it is not directly to the left of what should not be crossed.

(yummyy|tummyy|(?<!\bMr)[.?!] )

The whole pattern:

the (?!.*(yummyy|tummyy|(?<!\bMr)[.?!] )).{0,100} (has|have) (?!.*(yummyy|tummyy|(?<!\bMr)[.?!] )).{0,500} compliance (?!.*(yummyy|tummyy|(?<!\bMr)[.?!] )).{0,500} future

Regex demo

CodePudding user response:

Don't include matches to a dot followed by whitespace by replacing . with ((?!\.\s).).

You can also delete all but the first (?!.*(yummyy|tummyy)) since the first one covers the whole input.

Try this:

the (?!.*(yummyy|tummyy))((?!\.\s).){0,100} (has|have) ((?!\.\s).){0,500} compliance ((?!\.\s).){0,500} future

CodePudding user response:

text = '''
the district has a number of capital appreciation bonds outstanding that were 
issued at deep discounts. these discounts are being accreted over the life of 
the bonds. in the year ended june 30 2017 $1317869 was accreted and the 
accumulated accreted balance is presented within the long-term liabilities. 
there are a number of limitations and restrictions contained in the general 
obligation bond indenture. management has indicated that the district 
is in compliance with all significant limitations and restrictions at
 june 30 2017. h. commitments under leases capital leases the district has
  entered into three lease agreements for the purchase of technology equipment and buses. 
  the lease terms range from 32 to 60 months with interest rates ranging 
  from 1.499% to 3.587%. as of june 30 2017 the future
'''

for l in re.split(r'(?<=[a-z]{3}\.)(?=\s)', text):
    print(l)
  • (?<=[a-z]{3}.) - positive look behind for the sentence end.
  • (?=\s) positive look ahead for the space after a dot.

the district has a number of capital appreciation bonds outstanding that were issued at deep discounts. these discounts are being accreted over the life of the bonds. in the year ended june 30 2017 $1317869 was accreted and the accumulated accreted balance is presented within the long-term liabilities.

there are a number of limitations and restrictions contained in the general obligation bond indenture. management has indicated that the district is in compliance with all significant limitations and restrictions at june 30 2017. h. commitments under leases capital leases the district has entered into three lease agreements for the purchase of technology equipment and buses.

the lease terms range from 32 to 60 months with interest rates ranging from 1.499% to 3.587%. as of june 30 2017 the future

  • Related