Home > database >  extract values from columns based on another column values
extract values from columns based on another column values

Time:09-14

I have a Dataframe df like the following:

                    Warehouse        Date                Count
0     Delhivery Goa Warehouse     2022-05-12                83
1     Delhivery Goa Warehouse     2022-05-15                 1
2     Delhivery Goa Warehouse     2022-05-18               100
3     Delhivery Tauru Warehouse   2022-05-19               100
4     Delhivery Tauru Warehouse   2022-05-20               100

and another dataframe df_orig like the following:

              index                          Goa    Tauru    
0     2022-05-12Delhivery Goa Warehouse     100.0     0.0   
1     2022-05-15Delhivery Goa Warehouse     100.0     0.0   
2     2022-05-18Delhivery Goa Warehouse     100.0     0.0   
3     2022-05-20Delhivery Tauru Warehouse    0.0     50.0   
4     2022-05-19Delhivery Tauru Warehouse    0.0     70.0   

How can I pick values from the df_orig columns based on combination of warehouse and Date columns of the df?

Expected output:

                    Warehouse        Date                Count      original
0     Delhivery Goa Warehouse     2022-05-12                83       100
1     Delhivery Goa Warehouse     2022-05-15                 1       100
2     Delhivery Goa Warehouse     2022-05-18               100       100
3     Delhivery Tauru Warehouse   2022-05-19               100       70
4     Delhivery Tauru Warehouse   2022-05-20               100       50

My approach:

df['index1'] = str(df['Date'])   str(df['Warehouse'])
original = []
for index, row in df.iterrows():
    if row['index1'] == df_orig['index']:
        original.append(????)

CodePudding user response:

Concatenate Date and Warehouse columns in first dataframe, and calculate the sum of values of all the columns except first using iloc[:,1:], then merge two dataframes, and finally take only the columns of interest:

(df
.assign(index=df['Date']   df['Warehouse'])
.merge(df_orig.assign(original=df_orig.iloc[:,1:].sum(1)))
)[['Warehouse', 'Date', 'Count', 'original']]

OUTPUT:


                   Warehouse        Date  Count  original
0    Delhivery Goa Warehouse  2022-05-12     83     100.0
1    Delhivery Goa Warehouse  2022-05-15      1     100.0
2    Delhivery Goa Warehouse  2022-05-18    100     100.0
3  Delhivery Tauru Warehouse  2022-05-19    100      70.0
4  Delhivery Tauru Warehouse  2022-05-20    100      50.0

CodePudding user response:

merge() works here. You can also map the sum of the values in df_orig to df rows using map() method. As index column in df_orig is the same as the concatenation of Date and Warehouse columns in df, first concatenate those columns to make the mapping keys match.

# map the sum of the values in df_orig to df.Warehouse via df_orig.index
df['original'] = (df['Date'].astype(str) df['Warehouse']).map(df_orig.set_index('index').sum(1))

res

  • Related