Home > database >  How to find filenames with a specific extension using regex?
How to find filenames with a specific extension using regex?

Time:07-16

How can I grab 'dlc3.csv' & 'spongebob.csv' from the below string via the absolute quickest method - which i assume is regex?

4918,  fx,fx,weapon/muzzleflashes/fx_m1carbine,3.3,3.3,|sp/zombie_m1carbine|weapon|../zone_source/dlc3.csv|csv|../zone_source/spongebob.csv|csv

I've already managed to achieve this by using split() and for loops but its slowing my program down way too much.

I would post an example of my current code but its got a load of other stuff in it so it would only cause you to ask more questions.

In a nutshell im opening a large 6,000 line .csv file and im then using nested for loops to iterate through each line and using .split() to find specific parts in each line. I have many files where i need to scan specific things on each line and atm ive only implemented a couple features into my Qt program and its already taking upto 5 seconds to load some things and up to 10 seconds for others. All of which is due to the nested loops. Ive looked at where to use range, where not to, and where to use enumerate. I also use time.time() and loggin.info() to show each code changes speed. And after asking around ive been told that using a regex is the best option for me as it would remove the need for many of my for loops. Problem is i have no clue how to use regex. I of course plan on learning it but if someone could help me out with this it'll be much appreciated.

Thanks.

Edit: just to point out that when scanning each line the filename is unknown. ".csv" is the only thing that isnt unknown. So i basically need the regex to grab every filename before .csv but of course without grabbing the crap before the filename.

Im currently looking for .csv using .split('/') & .split('|'), then checking if .csv is in list index to grab the 'unknown' filename. And some lines will only have 1 filename whereas others will have 2 so i need the regex to account for this too.

CodePudding user response:

You can use this pattern: [^/]*\.csv

Breakdown:

  • [^/] - Any character that's not a forward slash (or newline)
    • * - Zero or more of them
  • \. - A literal dot. (This is necessary because the dot is a special character in regex.)

For example:

import re

s = '''4918,  fx,fx,weapon/muzzleflashes/fx_m1carbine,3.3,3.3,|sp/zombie_m1carbine|weapon|../zone_source/dlc3.csv|csv|../zone_source/spongebob.csv|csv'''

pattern = re.compile(r'[^/]*\.csv')

result = pattern.findall(s)

Result:

['dlc3.csv', 'spongebob.csv']

Note: It could just as easily be result = re.findall(r'[^/]*\.csv', s), but for code cleanliness, I prefer naming my regexes. You might consider giving it an even clearer name in your code, like pattern_csv_basename or something like that.

Docs: re, including re.findall

See also: The official Python Regular Expression HOWTO

  • Related