Grouping of subset of multiple Column in pandas-CodePudding

Having a data set as below.Here I need to group the subset in column and fill the missing values using mode method.

Need to group the value 'Tom' from name and 'UK' from Country and fill the missing value in value using mode.

Name	Country	Value
Tom	USA	30.0
Tom	UK	20.0
Tom	UK	NaN
nick	USA	19.0
nick	USA	18.0
Tom	India	23.0
nick	USA	15.0
nick	USA	20.0
jack	India	NaN
Tom	UK	23.0
Tom	UK	23.0

I tried the code below.

np.where((data['Name'] == 'Tom') & (data['Country']=='UK'),data['Value'].fillna(data['Value'].mode()[0]),data['Value'])

CodePudding user response：

I use .eq() .lt() .ge() etc, to avoid excess () everywhere when combining conditions~

Also, since this isn't an if...else, we can just use pd.DataFrame.mask instead of np.where~

I also assumed that you only want the mode of the same group~ rather than the whole column. If you want the whole column change data.loc[mask, 'Value'] to just data['Value']

mask = (data['Name'].eq('Tom') & 
        data['Country'].eq('UK'))
data['Value'] = data['Value'].mask(mask & data['Value'].isna(), 
                                   data.loc[mask, 'Value'].mode()[0])

    Name Country  Value
0    Tom     USA   30.0
1    Tom      UK   20.0
2    Tom      UK   23.0
3   nick     USA   19.0
4   nick     USA   18.0
5    Tom   India   23.0
6   nick     USA   15.0
7   nick     USA   20.0
8   jack   India    NaN
9    Tom      UK   23.0
10   Tom      UK   23.0

CodePudding user response：

You don't really need to go as far as masking the data for the mode you can simply try:

data['Value'] = np.where((data['Name'] == 'Tom') & (data['Country']=='UK'),data['Value'].fillna(data['Value'].mode()[0]),data['Value'])

Edit: If you'd like to address only those Null values where the Name is Tom and Country UK, you can add an additional criteria:

data['Value'] = np.where((data['Name'] == 'Tom') & (data['Country']=='UK') & (data['Value'].isna()),data['Value'].mode()[0],data['Value'])

CodePudding user response：

here is one way using group by and ffill:

df["Value"] = df.groupby(["Name", "Country"])["Value"].ffill()

output :

    Name Country  Value
0    Tom     USA   20.0
1    Tom      UK   20.0
2    Tom      UK   20.0
3   nick     USA   19.0
4   nick     USA   18.0
5    Tom   India   23.0
6   nick     USA   15.0
7   nick     USA   20.0
8   jack   India    NaN
9    Tom      UK   23.0
10   Tom      UK   23.0