I have an existing loop that I am using to go through a large amount of file paths, which ultimately sends the files through a cloud processing pipeline. I need to update the loop to match the file names with a dataframe column (fileName
), then get the associated data values from a second column (date
) and store that as a variable in my loop.
# dataframe that I need to extract 'date' from
df = pd.DataFrame({'id':['dat1', 'dat2', 'dat3'],
'date':[2019, 2021, 2015],
'fileName': ['dat1.file', 'dat2.file', 'dat3.file']})
# list of file paths that I need the fileName from to match with my dataframe
gs_files = ['path/dat1.file', 'path/dat2.file']
bucket = 'path/'
for f in gs_files:
# get file path
print('Path: ', f)
# get file name (need to keep this for later processing steps)
fbname = f.replace(bucket, '')
print('Image name: ', fbname)
# match fbname with df['fileName']. Store associated 'date' as a separate variable (not as a column in df)
if fbname in df['fileName']:
year = df['date']
print('Collection date: ',year)
# Extra processing steps will be executed below.
# Resulting output from the above code:
Path: path/dat1.file
Image name: dat1.file
Path: path/dat2.file
Image name: dat2.file
# Desired output:
Path: path/dat1.file
Image name: dat1.file
Collection date: 2019
Path: path/dat2.file
Image name: dat2.file
Collection date: 2021
CodePudding user response:
Change this code:
if fbname in df['fileName']:
year = df['date']
print('Collection date: ',year)
to this:
if df['fileName'].isin([fbname]).any():
year = df['date'][df['fileName'] == fbname].iloc[0]
print('Collection date: ',year)
fbname in df['fileName']
doesn't work. Instead, df['fileName'].isin([fbname])
will return a new column containing True
for each item in the original column that's in the list you specify ([fbname]
), False
otherwise. Then, .any()
returns True
if there is at least one True
in the column it's called on.
Also, df['date'][df['fileName'] == fbname]
selects the items from date
where fileName
is fbname
. .iloc[0]
gets the actual value out.