I have below data frame, I'm trying to calculate Average of "Price" column before and after Rate change, not considering in the average calculation the value of the Month when Rate changed. for example for Client "XY Ltd" rate Increased on Aug. so, Average of Price before rate change would be (5 6)/2 Average price after rate changed would be (-3-5)/2
below is the code I tried, but I don't know how to exclude the values of the Month when rate changed.
Also, I'm trying to create another column with the Subtraction of "Average after rate change" - "average before rate changed"
df['New Column'] = df.groupby(['Client', 'Rate'])['Price'].transform('mean')
Date | Client | Rate | Price |
---|---|---|---|
2022-06-01 | XY Ltd | 1.50 | 5 |
2022-07-01 | XY Ltd | 1.50 | 6 |
2022-08-01 | XY Ltd | 3.00 | 10 |
2022-09-01 | XY Ltd | 3.00 | -3 |
2022-10-01 | XY Ltd | 3.00 | -5 |
2022-06-01 | ZZ Inc | 1.60 | 3 |
2022-07-01 | ZZ Inc | 1.60 | 4 |
2022-08-01 | ZZ Inc | 4.00 | 12 |
2022-09-01 | ZZ Inc | 4.00 | -4 |
2022-10-01 | ZZ Inc | 4.00 | -6 |
desired output
Date | Client | Rate | Price | New Colum | New column 2 |
---|---|---|---|---|---|
2022-06-01 | XY Ltd | 1.50 | 5 | 5.5 | -9.5 |
2022-07-01 | XY Ltd | 1.50 | 6 | 5.5 | -9.5 |
2022-08-01 | XY Ltd | 3.00 | 10 | 0.0 | -9.5 |
2022-09-01 | XY Ltd | 3.00 | -3 | -4.0 | -9.5 |
2022-10-01 | XY Ltd | 3.00 | -5 | -4.0 | -9.5 |
2022-06-01 | ZZ Inc | 1.60 | 3 | 3.5 | -8.5 |
2022-07-01 | ZZ Inc | 1.60 | 4 | 3.5 | -8.5 |
2022-08-01 | ZZ Inc | 4.00 | 12 | 0.0 | -8.5 |
2022-09-01 | ZZ Inc | 4.00 | -4 | -5.0 | -8.5 |
2022-10-01 | ZZ Inc | 4.00 | -6 | -5.0 | -8.5 |
CodePudding user response:
I think this will do what you're asking:
df = df.join(df.groupby('Client')['Rate'].agg(['first']), on='Client')
df['rate_change'] = (df.Rate != df['first']) & (df.Rate.shift() == df['first'])
df['cs'] = df.groupby('Client')['rate_change'].cumsum() * (2 * ~df.rate_change - 1)
df2 = df.groupby(['Client','cs'])['Price'].mean().reset_index(level='cs')
df = df.join(df2[df2.cs == 0].rename(columns={'Price':'pre_change_avg'}).drop(columns='cs'),on='Client')
df = df.join(df2[df2.cs == 1].rename(columns={'Price':'post_change_avg'}).drop(columns='cs'),on='Client')
df['New Column'] = np.where((df.Rate == df['first']), df.pre_change_avg, np.where(df.rate_change, 0, df.post_change_avg))
df['New Column 2'] = df.post_change_avg - df.pre_change_avg
df = df.drop(columns=['first','rate_change','cs','pre_change_avg','post_change_avg'])
Output:
Date Client Rate Price New Column New Column 2
0 2022-06-01 XY Ltd 1.5 5 5.5 -9.5
1 2022-07-01 XY Ltd 1.5 6 5.5 -9.5
2 2022-08-01 XY Ltd 3.0 10 0.0 -9.5
3 2022-09-01 XY Ltd 3.0 -3 -4.0 -9.5
4 2022-10-01 XY Ltd 3.0 -5 -4.0 -9.5
5 2022-06-01 ZZ Ltd 1.6 3 3.5 -8.5
6 2022-07-01 ZZ Ltd 1.6 4 3.5 -8.5
7 2022-08-01 ZZ Ltd 4.0 12 0.0 -8.5
8 2022-09-01 ZZ Ltd 4.0 -4 -5.0 -8.5
9 2022-10-01 ZZ Ltd 4.0 -6 -5.0 -8.5
Explanation:
- create a new column
first
containing the startingRate
for eachClient
- create a boolean column
rate_change
indicating the first row with a newRate
for eachClient
- create a column
cs
containing 0 prior to theRate
change and 1 after it, with a -1 value for the first row of the newRate
- put the average of
Price
for each unique (Client
,cs
) pair into a new dataframedf2
- add new columns
pre_change_avg
andpost_change_average
to the original df - set
New Column
using these before and after theRate
change, and for the row of change use 0 - set
New Column 2
to be the difference betweenpre_change_avg
andpost_change_average
.
CodePudding user response:
Here's an alternative:
# Part 1: `New Column`
grouped = df.groupby("Client")["Rate"].diff()
m, groups = grouped.fillna(0.0).eq(0.0), grouped.ne(0.0).cumsum()
df.loc[m, "New Column"] = df[m].groupby(groups)["Price"].transform("mean")
# Part 2: `New Column 2`
df = df.set_index(groups).join(
df.groupby(groups).last().groupby("Client")["New Column"].diff()
.rename("New Column 2").bfill().to_frame()
).reset_index(drop=True)
New Column
: First groupdf
overClient
and determine whereRate
changes. Then build a maskm
to exclude the first rows after aRate
change, and determine the relevant groups for further.groupby
. To getNew Column
: group overgroups
and build the mean resp. means overPrice
, but only onm
.New Column 2
: Build a minimal temporary dataframe with the uniquegroups
as index and theNew Column
diffs perClient
as values, and then.join
it ondf
ongroups
.
Result:
Date Client Rate Price New Column New Column 2
0 2022-06-01 XY Ltd 1.5 5 5.5 -9.5
1 2022-07-01 XY Ltd 1.5 6 5.5 -9.5
2 2022-08-01 XY Ltd 3.0 10 NaN -9.5
3 2022-09-01 XY Ltd 3.0 -3 -4.0 -9.5
4 2022-10-01 XY Ltd 3.0 -5 -4.0 -9.5
5 2022-06-01 ZZ Inc 1.6 3 3.5 -8.5
6 2022-07-01 ZZ Inc 1.6 4 3.5 -8.5
7 2022-08-01 ZZ Inc 4.0 12 NaN -8.5
8 2022-09-01 ZZ Inc 4.0 -4 -5.0 -8.5
9 2022-10-01 ZZ Inc 4.0 -6 -5.0 -8.5