Home > other >  Regex finding the last occurrence of the highest number after a matching string
Regex finding the last occurrence of the highest number after a matching string

Time:11-20

I have the following text and I want a regex matching the last page of each file: https://regex101.com/r/DmVnK7/1

The right Regex gives the following result:

A_File1_Page1
**A_File1_Page2**

A_File2_Page1
A_File2_Page2
**A_File2_Page3**

B_File1_Page1
B_File1_Page2
**B_File1_Page3**

B_File2_Page1
B_File2_Page2
B_File2_Page3
**B_File2_Page4**

C_File1_Page1
C_File1_Page2
C_File1_Page3
C_File1_Page4
**C_File1_Page5**

CodePudding user response:

Regular expression

/(^.*_Page)\d $(?!\r?\n\1\d $)/gm

Example

https://regex101.com/r/Q2Ymk2/1

Description

  • 1st Capturing Group (^.*_Page)
    • ^ asserts position at start of a line
    • . matches any character (except for line terminators)
    • * matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
    • _Page matches the characters _Page literally (case sensitive)
  • \d matches a digit (equivalent to [0-9])
  • matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
  • $ asserts position at the end of a line
  • Negative Lookahead (?!\r?\n\1\d $)
    • Assert that the Regex below does not match
    • \r matches a carriage return (ASCII 13)
      • ? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
    • \n matches a line-feed (newline) character (ASCII 10)
    • \1 matches the same text as most recently matched by the 1st capturing group
    • \d matches a digit (equivalent to [0-9])
    • matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
    • $ asserts position at the end of a line

Global pattern flags

  • g modifier: global. All matches (don't return after first match)
  • m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

CodePudding user response:

Using regex, I think only get the last occurrence can be gleaned.
Mostly because there is no regex construct for counting.
If you need to count, match all pages (.*?Page\d ) then sort and unique.

If just getting the last page of each is enough, then this

(.*?Page)\d (?![\s\S]*\1)

https://regex101.com/r/iP3FcV/1

 ( .*? Page )                  # (1)
 \d  
 (?! [\s\S]* \1 )
  • Related