Home > Back-end >  Grouping of subset of a Column in pandas
Grouping of subset of a Column in pandas

Time:08-05

Having a data set as below.Here I need to group the subset in column and fill the missing values using mode method.

Need to group the value 'Tom' from name and fill the missing value in value using mode.

     Name  Value
0     Tom   20.0
1     Tom   20.0
2     Tom    NaN
3    nick   19.0
4    nick   18.0
5   krish   15.0
6    nick   15.0
7    nick   20.0
8    jack    NaN
9     Tom   23.0
10    Tom    NaN

I tried the code below:

data['value']= data.loc[data["Name"]=='Tom'].fillna(data['value'].mode()[0], inplace=True)

Didn't get the output as expected.

CodePudding user response:

To replace on specific conditions you could use np.where() to evaluate the condition, when it is met, replace it with the mode, otherwise retain the column's original value. Kindly try:

data['Value'] = np.where(data['Name'] =='Tom',data['Value'].fillna(data['Value'].mode()[0]),data['Value'])

This returns:

     Name  Value
0     Tom   20.0
1     Tom   20.0
2     Tom   20.0
3    nick   19.0
4    nick   18.0
5   krish   15.0
6    nick   15.0
7    nick   20.0
8    jack    NaN
9     Tom   23.0
10    Tom   20.0

CodePudding user response:

You can fill all NaN with:

import pandas as pd
import numpy as np
from scipy import stats

df['Value'] = df.groupby('Name')['Value'].transform(lambda x: x.fillna(stats.mode(x)[0][0]))
df
###
     Name  Value
0     Tom   20.0
1     Tom   20.0
2     Tom   20.0
3    nick   19.0
4    nick   18.0
5   krish   15.0
6    nick   15.0
7    nick   20.0
8    jack    NaN
9     Tom   23.0
10    Tom   20.0




For multiple columns' condition

df2
###
    Name Country  Value
0    Tom     USA   20.0
1    Tom      UK   20.0
2    Tom      UK    NaN
3   nick     USA   19.0
4   nick     USA   18.0
5    Tom   India   23.0
6   nick     USA   15.0
7   nick     USA   20.0
8   jack   India    NaN
9    Tom      UK   23.0
10   Tom      UK   23.0
df2['Value'] = df2.groupby(['Name','Country'])['Value'].transform(lambda x: x.fillna(stats.mode(x)[0][0]))
df2
###
    Name Country  Value
0    Tom     USA   20.0
1    Tom      UK   20.0
2    Tom      UK   23.0
3   nick     USA   19.0
4   nick     USA   18.0
5    Tom   India   23.0
6   nick     USA   15.0
7   nick     USA   20.0
8   jack   India    NaN
9    Tom      UK   23.0
10   Tom      UK   23.0
  • Related