Home > Software design >  Regex expression to match last numerical component, but exclude file extension
Regex expression to match last numerical component, but exclude file extension

Time:02-12

I'm stumped trying to figure out a regex expression. Given a file path, I need to match the last numerical component of the path ("frame" number in an image sequence), but also ignore any numerical component in the file extension.

For example, given path:

/path/to/file/abc123/GCAM5423.xmp

The following expression will correctly match 5423.

((?P<index>(?P<padding>0*)\d )(?!.*(0*)\d ))

However, this expression fails if for example the file extension contains a number as follows:

/path/to/file/abc123/GCAM5423.cr2

In this case the expression will match the 2 in the file extension, when I still need it to match 5423. How can I modify the above expression to ignore file extensions that have a numerical component?

Using python flavor of regex. Thanks in advance!

CodePudding user response:

You can try this one:
\/[a-zA-Z]*(\d*)\.[a-zA-Z0-9]{3,4}$

CodePudding user response:

Step1: Find substring before last dot.

(.*)\.

Input: /path/to/file/abc123/GCAM5423.cr2

Output: /path/to/file/abc123/GCAM5423

Step2: Find the last numbers using your regex.

Input: /path/to/file/abc123/GCAM5423

Output: 5423

I don't know how to join these two regexs, but it also usefult for you. My hopes^_^

CodePudding user response:

Try this pattern: \/[^/\d\s] (\d )\.[^/] $

See Regex Demo

Code:

import re

pattern = r"\/[^/\d\s] (\d )\.[^/] $"

texts = ['/path/to/file/abc123/GCAM5423.xmp', '/path/to/file/abc123/GCAM5423.cr2']

print([match.group(1) for x in texts if (match := re.search(pattern, x))])

Output:

['5423', '5423']

  • Related