Hello I have the following two strings
txt = '/path/to/photo/file.jpg'
txt = '/path/to/photo/file_crXXX.jpg'
in the second string, XXX is a long variable path with information in the name because that is processed.
I want to extract the name 'file' in both path
In order to this, I tried the following code
re.search(".*/(.*)\.jpg", txt).group(1)
re.search(".*/(.*)_cr.*", txt).group(1)
But when I try to combine in one expression with the following code
re.search(".*/(.*)(_cr.*)|(\.jpg)*", txt).group(1)
re.search(".*/(.*)(\.jpg)|(_cr.*)", txt).group(1)
Doesn't work properly, so how can I do this?
Thanks
CodePudding user response:
The problem was that you had captured a group that should not need to be captured, but the .*/(.*)(\.jpg)|(_cr.*)
was closer to the answer. Please use this regex to capture only the filename or its prefix.
([^/]*?)(?:\.jpg|_cr.*)$
Also, see the regex demo
import re
paths = ["/path/to/photo/file.jpg", "/path/to/photo/file_crXXX.jpg"]
for path in paths:
print(re.search(r"([^/]*?)(?:\.jpg|_cr.*)$", path).group(1))
CodePudding user response:
Since you're dealing with paths, why don't you use pathlib
?
For instance:
import pathlib
files = [
"/path/to/photo/abc1.jpg",
"/path/to/photo/def2.jpg",
"/path/to/photo/ghi3.jpg",
"/path/to/photo/file1_cr.jpg",
"/path/to/photo/file2_cr2.jpg",
"/path/to/photo/file3_crY.jpg",
]
stubs = []
for f in files:
stem = pathlib.Path(f).stem
try:
stub, _ = stem.split("_", maxsplit=1)
except ValueError:
stub = stem
stubs.append(stub)
print(stubs) # ['abc1', 'def2', 'ghi3', 'file1', 'file2', 'file3']