I have a dataframe generated from the following code:
vote_mode = dataset.groupby(['ini_num','dep_parl_group'])['vote'].agg(lambda x: disambiguated_mode(x)).to_frame()
This gives me the a three-column dataframe (ini_num, dep_parl_group, vote) where vote is the most frequent label, like the following:
ini_num | dep_parl_group | vote |
---|---|---|
12 | A | vot_in_favour |
B | vot_against | |
99 | A | vot_against |
C | vot_in_favour | |
D | vot_against |
I would like to change the vote
values of the dataset
(dataframe from which the groupby was built) to match the groupby dataframe attributes. The dataset
is as follows:
ini_num | dep_parl_group | vote | what I want |
---|---|---|---|
12 | A | vot_in_favour | vot_in_favour |
12 | A | vot_in_favour | vot_in_favour |
12 | A | vot_against | vot_in_favour |
12 | B | vot_against | vot_against |
12 | B | vot_against | vot_against |
99 | A | vot_against | vot_against |
99 | A | vot_against | vot_against |
99 | A | vot_in_favour | vot_against |
99 | C | vot_in_favour | vot_in_favour |
99 | D | vot_against | vot_against |
99 | D | vot_against | vot_against |
Specifically, I would like to have the vote
values of every entry of dataset
to match the corresponding ones in entries where the ini_num
and dep_parl_group
match.
Thanks in advance for any help you can provide.
CodePudding user response:
Try this, here I substituted for disambiguated_mode:
dataset['vote_1'] = (dataset.groupby(['ini_num','dep_parl_group'])['vote']
.transform(lambda x: x.mode()[0]))
Output:
ini_num dep_parl_group vote what I want vote_1
0 12 A vot_in_favour vot_in_favour vot_in_favour
1 12 A vot_in_favour vot_in_favour vot_in_favour
2 12 A vot_against vot_in_favour vot_in_favour
3 12 B vot_against vot_against vot_against
4 12 B vot_against vot_against vot_against
5 99 A vot_against vot_against vot_against
6 99 A vot_against vot_against vot_against
7 99 A vot_in_favour vot_against vot_against
8 99 C vot_in_favour vot_in_favour vot_in_favour
9 99 D vot_against vot_against vot_against
10 99 D vot_against vot_against vot_against
CodePudding user response:
You can set index of the original dataframe to ['ini_num', 'dep_parl_group']
, then do a left join with vote_mode
dataset.set_index(['ini_num', 'dep_parl_group']).join(vote_mode, on=['ini_num', 'dep_parl_group'], lsuffix='_old', rsuffix='_new')