I am working on a program that compares entries in cataloguing software (Rucio) with the files in storage. From the cataloguing, I get a path to what it believes the storage location for the file is. I then search that location for the file to see if it exists there or not. I have successfully created a bash script that performs this, but it would be a lot better if it could be redone in python.
The problem I have encountered is that python will not find the files, even when I know they exist there. I have tried stuff like
if path.exists(fulladdress):
does stuff
And providing a file I know exists it still does not find it. I suspect it has to do with the fact that the folder is huge, over 100 TB and over 287000 files, so it does not search the whole folder and therefore does not find the file.
Does there exist a python solution that works for folders that big?
Best regards Piotr
the bash script that works is:
os.system("cd; cd directory_with_files; test -e file_in_directory _exist && echo filename >> found.txt || echo filename >> not_found "
tried running this:
def findfile(name, path):
for dirpath, dirname, filename in os.walk(path):
if name in filename:
return os.path.join(dirpath, name)
def compere_checksum(not_missing_files):
not_missing_files_file = open(not_missing_files, 'r')
lines_not_missing_files_file = not_missing_files_file.readlines()
#Extract a list of fiels i know exist
for line in lines_not_missing_files_file:
line.replace(' ','')
line_list=line.split(",")
address=line_list[0].replace("LUND: file://", "")
#address= path to the folder
fille=address[address.rindex('/') 1:]
#fille the mane of the file
address=address.replace(fille,"")
#search for the file using bash
os.system("test -e {} && echo Found {}".format(line_list[0],fille))
#search for the file using python function abovea
filepath=findfile(address,fille)
print(filepath)
address is something along the lines of "/projects/dir/dir/dir/dir/dir/mc20/v12/4.0GeV/v2.2.1-3e/"
and fille is looks like this "mc_v12-4GeV-3e-inclusive_run1310195_t1601591250.root"
The script returns:
Found mc_v12-4GeV-3e-inclusive_run1310220_t1601591602.root
None
Found mc_v12-4GeV-3e-inclusive_run1310246_t1601592829.root
None
Found mc_v12-4GeV-3e-inclusive_run1310247_t1601591229.root
None
Found mc_v12-4GeV-3e-inclusive_run1310248_t1601591216.root
None
Found mc_v12-4GeV-3e-inclusive_run1310249_t1601591416.root
None
Found mc_v12-4GeV-3e-inclusive_run1310250_t1601591472.root
None
so the bash script finds it but the python does not
UPDATE: Solved
open(file) as f
finds the file. Don't know why this works but not the other, but whatever
CodePudding user response:
import os
def findfile(name, path):
for dirpath, dirname, filename in os.walk(path):
if name in filename:
return os.path.join(dirpath, name)
filepath = findfile("file2.txt", "/")
print(filepath)
CodePudding user response:
I can use:
while open(file) as f:
do stuff
Dont know why this works and not
path.exists
or
def findfile(name, path):
for dirpath, dirname, filename in os.walk(path):
if name in filename:
return os.path.join(dirpath, name)
but whatever, as long as it works it is fine.