Home > Software engineering >  Change values of dataframe based on values of groupby
Change values of dataframe based on values of groupby

Time:06-29

I have a dataframe generated from the following code:

vote_mode = dataset.groupby(['ini_num','dep_parl_group'])['vote'].agg(lambda x: disambiguated_mode(x)).to_frame()

This gives me the a three-column dataframe (ini_num, dep_parl_group, vote) where vote is the most frequent label, like the following:

ini_num dep_parl_group vote
12 A vot_in_favour
B vot_against
99 A vot_against
C vot_in_favour
D vot_against

I would like to change the vote values of the dataset (dataframe from which the groupby was built) to match the groupby dataframe attributes. The dataset is as follows:

ini_num dep_parl_group vote what I want
12 A vot_in_favour vot_in_favour
12 A vot_in_favour vot_in_favour
12 A vot_against vot_in_favour
12 B vot_against vot_against
12 B vot_against vot_against
99 A vot_against vot_against
99 A vot_against vot_against
99 A vot_in_favour vot_against
99 C vot_in_favour vot_in_favour
99 D vot_against vot_against
99 D vot_against vot_against

Specifically, I would like to have the vote values of every entry of dataset to match the corresponding ones in entries where the ini_num and dep_parl_group match.

Thanks in advance for any help you can provide.

CodePudding user response:

Try this, here I substituted for disambiguated_mode:

dataset['vote_1'] = (dataset.groupby(['ini_num','dep_parl_group'])['vote']
                            .transform(lambda x: x.mode()[0]))

Output:

    ini_num dep_parl_group           vote    what I want         vote_1
0        12              A  vot_in_favour  vot_in_favour  vot_in_favour
1        12              A  vot_in_favour  vot_in_favour  vot_in_favour
2        12              A    vot_against  vot_in_favour  vot_in_favour
3        12              B    vot_against    vot_against    vot_against
4        12              B    vot_against    vot_against    vot_against
5        99              A    vot_against    vot_against    vot_against
6        99              A    vot_against    vot_against    vot_against
7        99              A  vot_in_favour    vot_against    vot_against
8        99              C  vot_in_favour  vot_in_favour  vot_in_favour
9        99              D    vot_against    vot_against    vot_against
10       99              D    vot_against    vot_against    vot_against

CodePudding user response:

You can set index of the original dataframe to ['ini_num', 'dep_parl_group'], then do a left join with vote_mode

dataset.set_index(['ini_num', 'dep_parl_group']).join(vote_mode, on=['ini_num', 'dep_parl_group'], lsuffix='_old', rsuffix='_new')
  • Related