Home > Net >  Regex to extract the filename fails for filenames without path [duplicate]
Regex to extract the filename fails for filenames without path [duplicate]

Time:09-30

I have a regex which extracts the file name from path but if I don't specify path then it returns an error.

Code:

PATH = '.\var\pth\index.txt'
filename = re.search('.*/(.*)\\.\\w ', PATH).group(1)
print(filename)

Above regex has been tested on these examples:

  • .\var\pth\index.txt
  • .\index.php
  • index.txt (FAILED)

If I just pass index.txt then I get the error

AttributeError: 'NoneType' object has no attribute 'group'

Is there any way I can modify my above regex which can take any path and extract the filename?

CodePudding user response:

All too often, regex is the wrong tool. Just use either pathlib.Path for an OO approach or os.path.basename for a more functional approach:

In [1]: import os.path

In [2]: os.path.basename("/foo/bar/baz")
Out[2]: 'baz'

In [3]: os.path.basename??
Signature: os.path.basename(p)
Source:   
def basename(p):
    """Returns the final component of a pathname"""
    p = os.fspath(p)
    sep = _get_sep(p)
    i = p.rfind(sep)   1
    return p[i:]

CodePudding user response:

Another approach can be using rsplit as filename = PATH.rsplit("\\", 1)[-1]:

PATH = "\\var\\pth\\index.txt"
filename = PATH.rsplit("\\", 1)[-1]
print (filename)
PATH = "index.php"
filename = PATH.rsplit("\\", 1)[-1]
print (filename)

Output:

index.txt
index.php

If you want this code to work on Linux and Windows as well then below should be the approach:

import os
PATH = "/var/pth/index.txt"
filename = PATH.rsplit(f"{os.sep}", 1)[-1]
print (filename)

Output:

index.txt
  • Related