Home > database >  How to sum two columns ignoring NaN?
How to sum two columns ignoring NaN?

Time:04-30

I have a dataframe:

id1    id2
a      NaN
b      c
d      e

I want to create new columns ids as a sum of id1 and id2:

df.ids = df.id1   "-"   df.id2

But result is:

id1    id2   ids
a      NaN   NaN
b      c     b-c
d      e     d-e

As you see sum of a and NaN is NaN, however i would like to get a-NaN. How to do that? desired result:

id1    id2   ids
a      NaN   a-NaN
b      c     b-c
d      e     d-e

CodePudding user response:

import numpy as np
import pandas as pd

df = pd.DataFrame({"id1":['a', 'b', 'c'], "id2":[np.nan, 'c', 'e']})
df['ids'] = df.id1.replace(np.nan, 'NaN')   "-"   df.id2.replace(np.nan, 'NaN')
print(df)

>   id1  id2    ids
> 0   a  NaN  a-NaN
> 1   b    c    b-c
> 2   c    e    c-e

I think you can replace the np.nan before the sum.

CodePudding user response:

df = pd.DataFrame({'id1': ['a', 'b', 'c'], 'id2': [np.NaN, 'c', 'e']})
df['id2'] = df['id2'].astype(str)
df['ids'] = df['id1']   "-"   df['id2']
df

Result:

  id1  id2    ids
0   a  nan  a-nan
1   b    c    b-c
2   c    e    c-e

Alternatively, use:

df['id2'] = df['id2'].fillna('NaN')

instead of the type conversion. This way, you get to pick the exact string (including capitalization) you want for the NaNs.

CodePudding user response:

This will work:

df['ids'] = df['id1'].astype(str)   "-"   df['id2'].astype(str)

Output:

  id1  id2    ids
0   a  NaN  a-nan
1   b    c    b-c
2   d    e    d-e

Another way would be to first convert NaN to string by using .fillna():

df['id2'].fillna('NaN', inplace = True)
df['ids'] = df['id1']   "-"   df['id2']

Output:

  id1  id2    ids
0   a  NaN  a-NaN
1   b    c    b-c
2   d    e    d-e

CodePudding user response:

Given a number a, I would say that you cannot perform a-NaN, as NaN stands for "Not a Number".

If you just want to skip NaN as if they were 0s', you'll need to filter them (as done in this other case).

  • Related