I have two Data Frames. The first contains customer data without customer's size. The size can be taken from the other dataframe dfCust. How to assign customer size value from dfCust to the existing CustSize column in df?
Conditions: df.CustID == dfCust.ID OR df.CustVAT == dfCust.VAT.
The problem is that the data frames are not the same size, they are not subsets of each other and I have a lot of empty values.
Additional info: I replaced NaN with empty strings but I'm not sure if that's the right approach.I'm attaching my dummy data frames code.
df = pd.DataFrame({"CustID":['a1','b1','c2','','e1','','g5',''],
"CustVAT":["123","","321","333","444","","","777"],
"CustName":['A','B','C','D','E','F','G','H'],
"OtherInfo":['A1','B1','C1','D1','E1','F1','G1','H1'],
"CustSize":['','','','','','','',''],})
dfCust = pd.DataFrame({"ID":['','e1','','g5','','i6','j1'],
"VAT":["333","444","","","777",'888','999'],
"Name":['D','E','F','G','H','I','J'],
"Size":['small','big','medium','big','medium','medium','small'],})
CodePudding user response:
I am still not sure exactly what you are asking but given two dataframes df and dfCust as illustrated below: df =
CustID CustVAT CustName OtherInfo
0 a1 123 A A1
1 b1 B B1
2 c2 321 C C1
3 333 D D1
4 e1 444 E E1
5 F F1
6 g5 G G1
7 777 H H1
and dfCust=
ID VAT Name Size
0 333 D small
1 e1 444 E big
2 F medium
3 g5 G big
4 777 H medium
5 i6 888 I medium
6 j1 999 J small
You can add to df[CustSize] Column which will select from dfCust if either the CustId or CustVat match using the following:
# Function to select size data
def reportSize(dfc, cid, cvat):
if cid != '' and cid in dfc['ID'].to_list():
return dfc.loc[dfc['ID'] == cid]['Size'].values.item()
if cvat != '' and cvat in dfc['VAT'].to_list():
return dfc.loc[dfc['VAT'] == cvat]['Size'].values.item()
return np.nan
Then execute:
df['CustSize'] = df.apply(lambda row: reportSize(dfCust, row.CustID, row.CustVAT), axis = 1)
Which yields:
CustID CustVAT CustName OtherInfo CustSize
0 a1 123 A A1 NaN
1 b1 B B1 NaN
2 c2 321 C C1 NaN
3 333 D D1 small
4 e1 444 E E1 big
5 F F1 NaN
6 g5 G G1 big
7 777 H H1 medium
You can utilize standard pandas function df.fillna ()
to replace NaN values with whatever you need based on future processing requirements.