Home > Net >  Update dataframe based on matching values in one of two dataframe columns
Update dataframe based on matching values in one of two dataframe columns

Time:02-16

I have two Data Frames. The first contains customer data without customer's size. The size can be taken from the other dataframe dfCust. How to assign customer size value from dfCust to the existing CustSize column in df?

Conditions: df.CustID == dfCust.ID OR df.CustVAT == dfCust.VAT.

The problem is that the data frames are not the same size, they are not subsets of each other and I have a lot of empty values.

Additional info: I replaced NaN with empty strings but I'm not sure if that's the right approach.I'm attaching my dummy data frames code.

df = pd.DataFrame({"CustID":['a1','b1','c2','','e1','','g5',''],
               "CustVAT":["123","","321","333","444","","","777"],
               "CustName":['A','B','C','D','E','F','G','H'],
               "OtherInfo":['A1','B1','C1','D1','E1','F1','G1','H1'],
              "CustSize":['','','','','','','',''],})

dfCust = pd.DataFrame({"ID":['','e1','','g5','','i6','j1'],
               "VAT":["333","444","","","777",'888','999'],
               "Name":['D','E','F','G','H','I','J'],
               "Size":['small','big','medium','big','medium','medium','small'],})

CodePudding user response:

I am still not sure exactly what you are asking but given two dataframes df and dfCust as illustrated below: df =

    CustID  CustVAT CustName    OtherInfo
0   a1      123       A           A1
1   b1                B           B1
2   c2      321       C           C1
3           333       D           D1
4   e1      444       E           E1
5                     F           F1
6   g5                G           G1
7           777       H           H1  

and dfCust=

    ID  VAT Name    Size
0       333 D   small
1   e1  444 E   big
2           F   medium
3   g5      G   big
4       777 H   medium
5   i6  888 I   medium
6   j1  999 J   small  

You can add to df[CustSize] Column which will select from dfCust if either the CustId or CustVat match using the following:

#  Function to select size data
def reportSize(dfc, cid, cvat):
    if cid != '' and cid in dfc['ID'].to_list():
        return dfc.loc[dfc['ID'] == cid]['Size'].values.item()
    if cvat != '' and cvat in dfc['VAT'].to_list():
        return dfc.loc[dfc['VAT'] == cvat]['Size'].values.item()
    return np.nan  

Then execute:

df['CustSize'] = df.apply(lambda row: reportSize(dfCust, row.CustID, row.CustVAT), axis = 1)  

Which yields:

    CustID  CustVAT CustName    OtherInfo   CustSize
0   a1       123      A          A1         NaN
1   b1                B          B1         NaN
2   c2       321      C          C1         NaN
3            333      D          D1         small
4   e1       444      E          E1         big
5                     F          F1         NaN
6   g5                G          G1         big
7            777      H          H1         medium  

You can utilize standard pandas function df.fillna () to replace NaN values with whatever you need based on future processing requirements.

  • Related