Home > Net >  Within a loop, match items with values in a dataframe column, then store a separate column value as
Within a loop, match items with values in a dataframe column, then store a separate column value as

Time:12-11

I have an existing loop that I am using to go through a large amount of file paths, which ultimately sends the files through a cloud processing pipeline. I need to update the loop to match the file names with a dataframe column (fileName), then get the associated data values from a second column (date) and store that as a variable in my loop.

# dataframe that I need to extract 'date' from
df = pd.DataFrame({'id':['dat1', 'dat2', 'dat3'],
        'date':[2019, 2021, 2015],
        'fileName': ['dat1.file', 'dat2.file', 'dat3.file']})


# list of file paths that I need the fileName from to match with my dataframe
gs_files = ['path/dat1.file', 'path/dat2.file']
bucket = 'path/'


for f in gs_files:
    # get file path
    print('Path: ', f)

    # get file name (need to keep this for later processing steps)
    fbname = f.replace(bucket, '')
    print('Image name: ', fbname)

    # match fbname with df['fileName']. Store associated 'date' as a separate variable (not as a column in df)
    if fbname in df['fileName']:
        year = df['date']
        print('Collection date: ',year)

    # Extra processing steps will be executed below.
# Resulting output from the above code:
Path:  path/dat1.file
Image name:  dat1.file
Path:  path/dat2.file
Image name:  dat2.file

# Desired output:
Path:  path/dat1.file
Image name:  dat1.file
Collection date: 2019

Path:  path/dat2.file
Image name:  dat2.file
Collection date: 2021

CodePudding user response:

Change this code:

if fbname in df['fileName']:
    year = df['date']
    print('Collection date: ',year)

to this:

if df['fileName'].isin([fbname]).any():
    year = df['date'][df['fileName'] == fbname].iloc[0]
    print('Collection date: ',year)

fbname in df['fileName'] doesn't work. Instead, df['fileName'].isin([fbname]) will return a new column containing True for each item in the original column that's in the list you specify ([fbname]), False otherwise. Then, .any() returns True if there is at least one True in the column it's called on.

Also, df['date'][df['fileName'] == fbname] selects the items from date where fileName is fbname. .iloc[0] gets the actual value out.

  • Related