I have a python program which lists me the paths which are files and folders stored in cloud in some datalake. I need a way to sort the files from folders. the following is the list of paths
Project 1/Example1/SomeFolder
Project 1/Example1/SomeFolder/example1.2
Project 1/Example1/SomeFolder/example1.2/example_r_01.txt
Project 1/Example1/SomeFolder/example1.2/example_r_02.txt
I want a way to filter out/ subset of the above to have only files (any kind of files like .txt or .xlsx, .cad, .csv, etc..). In the example above I need
Project 1/Example1/SomeFolder/example1.2/example_r_01.txt
Project 1/Example1/SomeFolder/example1.2/example_r_02.txt
I tried using os.path.isfile() and os.path.isdir(). but the problem is these solutions work when you have the file present in your local path. and the paths what I have is not present in my local environment. Is there any custom logic which I could implement ? Thanks.
CodePudding user response:
Here is solution using regex. The regex pattern looks for 3-4 letter extensions at the end of each line.
eg.
.xls
.xlsx
.txt
.cad
.csv
...
Code:
import re
path_list = ['Project 1/Example1/SomeFolder',
'Project 1/Example1/SomeFolder/example1.2',
'Project 1/Example1/SomeFolder/example1.2/example_r_01.txt',
'Project 1/Example1/SomeFolder/example1.2/example_r_02.txt',
'Project 1/Example1/SomeFolder/example1.2/example_r_03.xlsx']
# Only print the files (not folders)
for item in path_list:
if re.search(r'\.[A-Za-z]{3,4}$', item):
print(item)
Output:
Project 1/Example1/SomeFolder/example1.2/example_r_01.txt
Project 1/Example1/SomeFolder/example1.2/example_r_02.txt
Project 1/Example1/SomeFolder/example1.2/example_r_03.xlsx
CodePudding user response:
import os.path
from os import path
path.exists("guru99.txt")
import os.path
from os import path
def main():
print ("File exists:" str(path.exists('guru99.txt')))
print ("File exists:" str(path.exists('career.guru99.txt')))
print ("directory exists:" str(path.exists('myDirectory')))
if __name__== "__main__":
main()