Home > other >  regex: pattern fails to match what I am looking for
regex: pattern fails to match what I am looking for

Time:11-02

I have the following code that tries to retrieve the name of a file from a directory based on a double \ character:

import re

string = 'I:/Etrmtest/PZMALIo4/ETRM841_FX_Deals_Restructuring/FO_PRE\\abo_st_gas_dtd.csv'
pattern = r'(?<=*\\\\)*'
re.findall(pattern,string)

The reasoning behind is that the name of the file is always after a double \ , so I try to look any string which is preceeded by any text that finishes with \ .

Neverthless, when I apply this code I get the following error:

error: nothing to repeat at position 4

What am I doing wrong?

Edit: The concrete output I am looking for is getting the string 'abo_st_gas_dtd_csv' as a match.

CodePudding user response:

There's a couple of things going on:

  1. You need to declare your string definition using the same r'string' notation as for the pattern; right now your string only has a single backslash, since the first one of the two is escaped.
  2. I'm not sure you're using * correctly. It means "repeat immediately preceding group", and not just "any string" (as, e.g., in the usual shell patterns). The first * in parentheses does not have anything preceding it, meaning that the regex is invalid. Hence the error you see. I think, what you want is .*, i.e., repeating any character 0 or more times. Furthermore, it is not needed in the parentheses. A more correct regexp would be r'(?<=\\\\).*':
import re

string = r'I:/Etrmtest/PZMALIo4/ETRM841_FX_Deals_Restructuring/FO_PRE\\abo_st_gas_dtd.csv'

pattern = r'(?<=\\\\).*'

re.findall(pattern,string)

CodePudding user response:

Your pattern is just a lookabehind, which, by itself, can't match anything. I would use this re.findall approach:

string = 'I:/Etrmtest/PZMALIo4/ETRM841_FX_Deals_Restructuring/FO_PRE\\abo_st_gas_dtd.csv'
filename = re.findall(r'\\([^.] \.\w )$', string)[0]
print(filename)  # abo_st_gas_dtd.csv

CodePudding user response:

files = 'I:E\\trm.csvest/PZMALIo4\ETRM841_FX_.csvDeals_Restructuring/FO_PRE\\abo_st_gas_dtd.csv'
counter = -1
my_files = []
for f in files:
    counter  = 1
    if ord(f) == 92:#'\'
        temp = files[counter 1:len(files)]
        temp_file = ""
        for f1 in temp:
            temp_file  = f1
            # [0-len(temp_file)] => if [char after . to num index of type file]== csv
            if f1 == '.' and temp[len(temp_file):len(temp_file) 3] == "csv":
                my_files.append(temp_file   "csv")
                break
print(my_files)#['trm.csv', 'ETRM841_FX_.csv', 'abo_st_gas_dtd.csv']


  • Related