I have a pandas dataframe such as:
group | month | value |
---|---|---|
1 | 1 | 2 |
1 | 2 | 2 |
1 | 3 | 3 |
2 | 1 | 7 |
2 | 2 | 8 |
2 | 3 | 8 |
3 | 1 | 9 |
3 | 2 | 0 |
3 | 3 | 1 |
And I want to calculate a new column ('want' in the below) equal to the value where month == 2, per group, as shown below:
group | month | value | want |
---|---|---|---|
1 | 1 | 2 | 2 |
1 | 2 | 2 | 2 |
1 | 3 | 3 | 2 |
2 | 1 | 7 | 8 |
2 | 2 | 8 | 8 |
2 | 3 | 8 | 8 |
3 | 1 | 9 | 0 |
3 | 2 | 0 | 0 |
3 | 3 | 1 | 0 |
Anyone able to help?
CodePudding user response:
You can use map
:
df['want'] = df['group'].map(df[df['month'] == 2].set_index('group')['value'])
print(df)
# Output
group month value want
0 1 1 2 2
1 1 2 2 2
2 1 3 3 2
3 2 1 7 8
4 2 2 8 8
5 2 3 8 8
6 3 1 9 0
7 3 2 0 0
8 3 3 1 0
CodePudding user response:
Guess I could just create a groupby df (groupby group where mth == 2) then merge back to it.
Will just go with that instead of attempting to do via a groupby.apply route!
CodePudding user response:
You can try this,
grouped_df = df.groupby("group")
groups = []
wants = []
for group in df["group"].unique():
group_df = grouped_df.get_group(group)
want = group_df.loc[group_df["month"] == 2, "value"].values[0]
[wants.append(want) for i in range(group_df.shape[0])]
df["wants"] = wants
Output -
group | month | value | wants | |
---|---|---|---|---|
0 | 1 | 1 | 2 | 2 |
1 | 1 | 2 | 2 | 2 |
2 | 1 | 3 | 3 | 2 |
3 | 2 | 1 | 7 | 8 |
4 | 2 | 2 | 8 | 8 |
5 | 2 | 3 | 8 | 8 |
6 | 3 | 1 | 9 | 0 |
7 | 3 | 2 | 0 | 0 |
8 | 3 | 3 | 1 | 0 |
CodePudding user response:
You can use:
df['want'] = df['value'].where(df['month']==2).groupby(df['group']).transform('mean').astype(int)
Result
print(df)
group month value want
0 1 1 2 2
1 1 2 2 2
2 1 3 3 2
3 2 1 7 8
4 2 2 8 8
5 2 3 8 8
6 3 1 9 0
7 3 2 0 0
8 3 3 1 0
In a case whereby for a certain group we have more than one observation for
month==2
, it will fill want
with the mean value
of those observations where month==2
for that group.
For example: Given this new dataframe(df_new)
group | month | value | |
---|---|---|---|
0 | 1 | 1 | 2 |
1 | 1 | 2 | 2 |
2 | 1 | 2 | 4 |
3 | 1 | 3 | 3 |
4 | 2 | 1 | 7 |
5 | 2 | 2 | 8 |
6 | 2 | 3 | 8 |
7 | 3 | 1 | 9 |
8 | 3 | 2 | 0 |
9 | 3 | 3 | 1 |
Running
df_new['want'] = df_new['value'].where(df_mean['month']==2).groupby(df_new['group']).transform('mean').astype(int)
Result
print(df_new)
group month value want
0 1 1 2 3
1 1 2 2 3
2 1 2 4 3
3 1 3 3 3
4 2 1 7 8
5 2 2 8 8
6 2 3 8 8
7 3 1 9 0
8 3 2 0 0
9 3 3 1 0