Home > other >  Calculate average before and after a Date in Python
Calculate average before and after a Date in Python

Time:12-17

I have below data frame, I'm trying to calculate Average of "Price" column before and after Rate change, not considering in the average calculation the value of the Month when Rate changed. for example for Client "XY Ltd" rate Increased on Aug. so, Average of Price before rate change would be (5 6)/2 Average price after rate changed would be (-3-5)/2

below is the code I tried, but I don't know how to exclude the values of the Month when rate changed.
Also, I'm trying to create another column with the Subtraction of "Average after rate change" - "average before rate changed"

df['New Column'] =  df.groupby(['Client', 'Rate'])['Price'].transform('mean')
Date Client Rate Price
2022-06-01 XY Ltd 1.50 5
2022-07-01 XY Ltd 1.50 6
2022-08-01 XY Ltd 3.00 10
2022-09-01 XY Ltd 3.00 -3
2022-10-01 XY Ltd 3.00 -5
2022-06-01 ZZ Inc 1.60 3
2022-07-01 ZZ Inc 1.60 4
2022-08-01 ZZ Inc 4.00 12
2022-09-01 ZZ Inc 4.00 -4
2022-10-01 ZZ Inc 4.00 -6

desired output

Date Client Rate Price New Colum New column 2
2022-06-01 XY Ltd 1.50 5 5.5 -9.5
2022-07-01 XY Ltd 1.50 6 5.5 -9.5
2022-08-01 XY Ltd 3.00 10 0.0 -9.5
2022-09-01 XY Ltd 3.00 -3 -4.0 -9.5
2022-10-01 XY Ltd 3.00 -5 -4.0 -9.5
2022-06-01 ZZ Inc 1.60 3 3.5 -8.5
2022-07-01 ZZ Inc 1.60 4 3.5 -8.5
2022-08-01 ZZ Inc 4.00 12 0.0 -8.5
2022-09-01 ZZ Inc 4.00 -4 -5.0 -8.5
2022-10-01 ZZ Inc 4.00 -6 -5.0 -8.5

CodePudding user response:

I think this will do what you're asking:

df = df.join(df.groupby('Client')['Rate'].agg(['first']), on='Client')
df['rate_change'] = (df.Rate != df['first']) & (df.Rate.shift() == df['first'])
df['cs'] = df.groupby('Client')['rate_change'].cumsum() * (2 * ~df.rate_change - 1)
df2 = df.groupby(['Client','cs'])['Price'].mean().reset_index(level='cs')
df = df.join(df2[df2.cs == 0].rename(columns={'Price':'pre_change_avg'}).drop(columns='cs'),on='Client')
df = df.join(df2[df2.cs == 1].rename(columns={'Price':'post_change_avg'}).drop(columns='cs'),on='Client')
df['New Column'] = np.where((df.Rate == df['first']), df.pre_change_avg, np.where(df.rate_change, 0, df.post_change_avg))
df['New Column 2'] = df.post_change_avg - df.pre_change_avg
df = df.drop(columns=['first','rate_change','cs','pre_change_avg','post_change_avg'])

Output:

        Date  Client  Rate  Price  New Column  New Column 2
0 2022-06-01  XY Ltd   1.5      5         5.5          -9.5
1 2022-07-01  XY Ltd   1.5      6         5.5          -9.5
2 2022-08-01  XY Ltd   3.0     10         0.0          -9.5
3 2022-09-01  XY Ltd   3.0     -3        -4.0          -9.5
4 2022-10-01  XY Ltd   3.0     -5        -4.0          -9.5
5 2022-06-01  ZZ Ltd   1.6      3         3.5          -8.5
6 2022-07-01  ZZ Ltd   1.6      4         3.5          -8.5
7 2022-08-01  ZZ Ltd   4.0     12         0.0          -8.5
8 2022-09-01  ZZ Ltd   4.0     -4        -5.0          -8.5
9 2022-10-01  ZZ Ltd   4.0     -6        -5.0          -8.5

Explanation:

  • create a new column first containing the starting Rate for each Client
  • create a boolean column rate_change indicating the first row with a new Rate for each Client
  • create a column cs containing 0 prior to the Rate change and 1 after it, with a -1 value for the first row of the new Rate
  • put the average of Price for each unique (Client, cs) pair into a new dataframe df2
  • add new columns pre_change_avg and post_change_average to the original df
  • set New Column using these before and after the Rate change, and for the row of change use 0
  • set New Column 2 to be the difference between pre_change_avg and post_change_average.

CodePudding user response:

Here's an alternative:

# Part 1: `New Column`
grouped = df.groupby("Client")["Rate"].diff()
m, groups = grouped.fillna(0.0).eq(0.0), grouped.ne(0.0).cumsum()
df.loc[m, "New Column"] = df[m].groupby(groups)["Price"].transform("mean")

# Part 2: `New Column 2`
df = df.set_index(groups).join(
    df.groupby(groups).last().groupby("Client")["New Column"].diff()
    .rename("New Column 2").bfill().to_frame()
).reset_index(drop=True)
  • New Column: First group df over Client and determine where Rate changes. Then build a mask m to exclude the first rows after a Rate change, and determine the relevant groups for further .groupby. To get New Column: group over groups and build the mean resp. means over Price, but only on m.
  • New Column 2: Build a minimal temporary dataframe with the unique groups as index and the New Column diffs per Client as values, and then .join it on df on groups.

Result:

         Date  Client  Rate  Price  New Column  New Column 2
0  2022-06-01  XY Ltd   1.5      5         5.5          -9.5
1  2022-07-01  XY Ltd   1.5      6         5.5          -9.5
2  2022-08-01  XY Ltd   3.0     10         NaN          -9.5
3  2022-09-01  XY Ltd   3.0     -3        -4.0          -9.5
4  2022-10-01  XY Ltd   3.0     -5        -4.0          -9.5
5  2022-06-01  ZZ Inc   1.6      3         3.5          -8.5
6  2022-07-01  ZZ Inc   1.6      4         3.5          -8.5
7  2022-08-01  ZZ Inc   4.0     12         NaN          -8.5
8  2022-09-01  ZZ Inc   4.0     -4        -5.0          -8.5
9  2022-10-01  ZZ Inc   4.0     -6        -5.0          -8.5
  • Related