Home > Back-end >  extracting a filepath as a string using a global recursive search?
extracting a filepath as a string using a global recursive search?

Time:04-22

apologies if this is a basic question, but I couldn't seem to find a clear cut solution. I'm using a global recursive search to find a file with a specific extension in a directory and it's subdirectories, like so:

my code

bam = list(Path('path/to/file').rglob("*.bam"))

This returns something like:

[PosixPath('path/to/file/file.bam')]

However, I want to extract just the filepath, so that bam variable is a string contains just the file path, i.e.

bam = 'path/to/file/file.bam'.

I realize I could probably convert the current output to a string, and then use regex to extract everything between the ' ' but I'm hoping there's a more elegant way, or even a simpler solution to recursively search files for different extensions and outputting a filepath as a string!

As always, any help is appreciated!

CodePudding user response:

rglob return a generator which yields references to objects that are subclasses of PurePath. Those objects will reveal the actual pathname via their implementation of __str__(). Therefore:

from pathlib import Path

for p in Path('path/to/file').rglob('*.bam'):
    print(p)

...will almost certainly give what you're looking for.

Bear in mind that print() will implicitly invoke an object's str() function (if available). If you need to get the filenames into a list then you would need to explicitly call str(). For example:

lop = [str(p) for p in Path('path/to/file').rglob('*.bam')]

CodePudding user response:

What you're getting as output is a list of PosixPaths.

A PosixPath is part of the Python pathlib library. It's an object that stores your path in its "pure" form (so you can ignore stuff like capitalisation and access useful path specific methods). Depending on what you're doing next, it may be handier to keep it like this!

To solve your stated problem, you'll need to access the first PosixPath in your list and convert it to a string. Note, however, that this will only give you the first match if there is more than one file with that filetype in the directory.

matches = list(Path('path/to/file').rglob("*.bam"))
bam = str(matches[0])
  • Related