Home > Back-end >  How to fill nans with multiple if-else conditions?
How to fill nans with multiple if-else conditions?

Time:08-03

I have a dataset:

    value   score
0   0.0     8
1   0.0     7
2   NaN     4
3   1.0     11
4   2.0     22
5   NaN     12
6   0.0     4
7   NaN     15
8   0.0     5
9   2.0     24
10  1.0     12
11  1.0     15
12  0.0     5
13  2.0     26
14  NaN     28

There are some NaNs in it. I want to fill those NaNs with these conditions:

  • If 'score' is less than 10, then fill nan with 0.0
  • If 'score' is between 10 and 20, then fill nan with 1.0
  • If 'score' is greater than 20, then fill nan with 2.0

How do I do this in pandas?

Here is an example dataframe:

value = [0,0,np.nan,1,2,np.nan,0,np.nan,0,2,1,1,0,2,np.nan]
score = [8,7,4,11,22,12,4,15,5,24,12,15,5,26,28]
pd.DataFrame({'value': value, 'score':score})

CodePudding user response:

Do with cut then fillna

df.value.fillna(pd.cut(df.score,[-np.Inf,10,20,np.Inf],labels = [0,1,2]).astype(int),inplace=True)
df
Out[6]: 
    value  score
0     0.0      8
1     0.0      7
2     0.0      4
3     1.0     11
4     2.0     22
5     1.0     12
6     0.0      4
7     1.0     15
8     0.0      5
9     2.0     24
10    1.0     12
11    1.0     15
12    0.0      5
13    2.0     26
14    2.0     28

CodePudding user response:

You could use numpy.select with conditions on <10, 10≤score<20, etc. but a more efficient version could be to use a floor division to have values below 10 become 0, below 20 -> 1, etc.

df['value'] = df['value'].fillna(df['score'].floordiv(10))

with numpy.select:

df['value'] = df['value'].fillna(np.select([df['score'].lt(10),
                                            df['score'].between(10, 20),
                                            df['score'].ge(20)],
                                            [0, 1, 2])
                                 )

output:

    value  score
0     0.0      8
1     0.0      7
2     0.0      4
3     1.0     11
4     2.0     22
5     1.0     12
6     0.0      4
7     1.0     15
8     0.0      5
9     2.0     24
10    1.0     12
11    1.0     15
12    0.0      5
13    2.0     26
14    2.0     28

CodePudding user response:

use np.select or pd.cut to map the intervals to values, then fillna:

mapping = np.select((df['score'] < 10, df['score'] > 20),
                    (0, 2), 1)
df['value'] = df['value'].fillna(mapping)
  • Related