How can I catch the following groups with a regex?-CodePudding

Hello I have the following two strings

txt = '/path/to/photo/file.jpg'
txt = '/path/to/photo/file_crXXX.jpg'

in the second string, XXX is a long variable path with information in the name because that is processed.

I want to extract the name 'file' in both path

In order to this, I tried the following code

re.search(".*/(.*)\.jpg", txt).group(1)
re.search(".*/(.*)_cr.*", txt).group(1)

But when I try to combine in one expression with the following code

re.search(".*/(.*)(_cr.*)|(\.jpg)*", txt).group(1)
re.search(".*/(.*)(\.jpg)|(_cr.*)", txt).group(1)

Doesn't work properly, so how can I do this?

Thanks

CodePudding user response：

The problem was that you had captured a group that should not need to be captured, but the .*/(.*)(\.jpg)|(_cr.*) was closer to the answer. Please use this regex to capture only the filename or its prefix.

([^/]*?)(?:\.jpg|_cr.*)$

Also, see the regex demo

import re

paths = ["/path/to/photo/file.jpg", "/path/to/photo/file_crXXX.jpg"]
for path in paths:
    print(re.search(r"([^/]*?)(?:\.jpg|_cr.*)$", path).group(1))

CodePudding user response：

Since you're dealing with paths, why don't you use pathlib ?

For instance:

import pathlib

files = [
    "/path/to/photo/abc1.jpg",
    "/path/to/photo/def2.jpg",
    "/path/to/photo/ghi3.jpg",
    "/path/to/photo/file1_cr.jpg",
    "/path/to/photo/file2_cr2.jpg",
    "/path/to/photo/file3_crY.jpg",
    ]

stubs = []

for f in files:
    stem = pathlib.Path(f).stem
    try:
        stub, _ = stem.split("_", maxsplit=1)
    except ValueError:
        stub = stem
    stubs.append(stub)

print(stubs)  # ['abc1', 'def2', 'ghi3', 'file1', 'file2', 'file3']