Sum two columns only if the values of one column is bigger/greater 0-CodePudding

I've got the following dataframe

lst=[['01012021','',100],['01012021','','50'],['01022021',140,5],['01022021',160,12],['01032021','',20],['01032021',200,25]]
df1=pd.DataFrame(lst,columns=['Date','AuM','NNA'])

I am looking for a code which sums the columns AuM and NNA only if the values of column AuM contains a value. The result is showed below:

lst=[['01012021','',100,''],['01012021','','50',''],['01022021',140,5,145],['01022021',160,12,172],['01032021','',20,'']]
df2=pd.DataFrame(lst,columns=['Date','AuM','NNA','Sum'])

Thank you for your help.

CodePudding user response：

It is not a good practice to use '' in place of NaN when you have numeric data.

That said, a generic solution to your issue would be to use sum with the skipna=False option:

df1['Sum'] = (df1[['AuM', 'NNA']] # you can use as many columns as you want
        .apply(pd.to_numeric, errors='coerce')  # convert to numeric
        .sum(1, skipna=False)                   # sum if all are non-NaN
        .fillna('')               # fill NaN with empty string (bad practice)
       )

output:

       Date  AuM  NNA    Sum
0  01012021       100       
1  01012021        50       
2  01022021  140    5  145.0
3  01022021  160   12  172.0
4  01032021        20       
5  01032021  200   25  225.0

CodePudding user response：

I assume you mean to include the last row too:

df2 = (df1.assign(Sum=df1.loc[df1.AuM.ne(""), ["AuM", "NNA"]].sum(axis=1))
          .fillna(""))
print(df2)

Result:

       Date  AuM  NNA    Sum
0  01012021       100       
1  01012021        50       
2  01022021  140    5  145.0
3  01022021  160   12  172.0
4  01032021        20       
5  01032021  200   25  225.0