Home > database >  Grouping of subset of multiple Column in pandas
Grouping of subset of multiple Column in pandas

Time:08-06

Having a data set as below.Here I need to group the subset in column and fill the missing values using mode method.

Need to group the value 'Tom' from name and 'UK' from Country and fill the missing value in value using mode.

Name Country Value
Tom USA 30.0
Tom UK 20.0
Tom UK NaN
nick USA 19.0
nick USA 18.0
Tom India 23.0
nick USA 15.0
nick USA 20.0
jack India NaN
Tom UK 23.0
Tom UK 23.0

I tried the code below.

np.where((data['Name'] == 'Tom') & (data['Country']=='UK'),data['Value'].fillna(data['Value'].mode()[0]),data['Value'])

CodePudding user response:

I use .eq() .lt() .ge() etc, to avoid excess () everywhere when combining conditions~

Also, since this isn't an if...else, we can just use pd.DataFrame.mask instead of np.where~

I also assumed that you only want the mode of the same group~ rather than the whole column. If you want the whole column change data.loc[mask, 'Value'] to just data['Value']

mask = (data['Name'].eq('Tom') & 
        data['Country'].eq('UK'))
data['Value'] = data['Value'].mask(mask & data['Value'].isna(), 
                                   data.loc[mask, 'Value'].mode()[0])
    Name Country  Value
0    Tom     USA   30.0
1    Tom      UK   20.0
2    Tom      UK   23.0
3   nick     USA   19.0
4   nick     USA   18.0
5    Tom   India   23.0
6   nick     USA   15.0
7   nick     USA   20.0
8   jack   India    NaN
9    Tom      UK   23.0
10   Tom      UK   23.0

CodePudding user response:

You don't really need to go as far as masking the data for the mode you can simply try:

data['Value'] = np.where((data['Name'] == 'Tom') & (data['Country']=='UK'),data['Value'].fillna(data['Value'].mode()[0]),data['Value'])

Edit: If you'd like to address only those Null values where the Name is Tom and Country UK, you can add an additional criteria:

data['Value'] = np.where((data['Name'] == 'Tom') & (data['Country']=='UK') & (data['Value'].isna()),data['Value'].mode()[0],data['Value'])

CodePudding user response:

here is one way using group by and ffill:

df["Value"] = df.groupby(["Name", "Country"])["Value"].ffill()

output :

    Name Country  Value
0    Tom     USA   20.0
1    Tom      UK   20.0
2    Tom      UK   20.0
3   nick     USA   19.0
4   nick     USA   18.0
5    Tom   India   23.0
6   nick     USA   15.0
7   nick     USA   20.0
8   jack   India    NaN
9    Tom      UK   23.0
10   Tom      UK   23.0
  • Related