I've got the following dataframe
lst=[['01012021','',100],['01012021','','50'],['01022021',140,5],['01022021',160,12],['01032021','',20],['01032021',200,25]]
df1=pd.DataFrame(lst,columns=['Date','AuM','NNA'])
I am looking for a code which sums the columns AuM and NNA only if the values of column AuM contains a value. The result is showed below:
lst=[['01012021','',100,''],['01012021','','50',''],['01022021',140,5,145],['01022021',160,12,172],['01032021','',20,'']]
df2=pd.DataFrame(lst,columns=['Date','AuM','NNA','Sum'])
Thank you for your help.
CodePudding user response:
It is not a good practice to use ''
in place of NaN when you have numeric data.
That said, a generic solution to your issue would be to use sum
with the skipna=False
option:
df1['Sum'] = (df1[['AuM', 'NNA']] # you can use as many columns as you want
.apply(pd.to_numeric, errors='coerce') # convert to numeric
.sum(1, skipna=False) # sum if all are non-NaN
.fillna('') # fill NaN with empty string (bad practice)
)
output:
Date AuM NNA Sum
0 01012021 100
1 01012021 50
2 01022021 140 5 145.0
3 01022021 160 12 172.0
4 01032021 20
5 01032021 200 25 225.0
CodePudding user response:
I assume you mean to include the last row too:
df2 = (df1.assign(Sum=df1.loc[df1.AuM.ne(""), ["AuM", "NNA"]].sum(axis=1))
.fillna(""))
print(df2)
Result:
Date AuM NNA Sum
0 01012021 100
1 01012021 50
2 01022021 140 5 145.0
3 01022021 160 12 172.0
4 01032021 20
5 01032021 200 25 225.0