Suppose I have a DF below:
EOM | Nike | Adidas |
---|---|---|
1/31/2021 | 100 | 80 |
1/31/2022 | 110 | 80 |
1/31/2023 | 132 | 88 |
... | ... | ... |
How do I get:
EOM | Nike | Adidas |
---|---|---|
1/31/2021 | null | null |
1/31/2022 | 10% | 0 |
1/31/2023 | 20% | 10% |
... | ... | ... |
CodePudding user response:
It would be preferable for you to provide a bit more information about the desired output, but it looks like you want a percent change between the end of months for consecutive years.
You can sort the EOM column if it isn't already sorted, then perform a groupby on the month and calculate the percent change. Thanks to @mozway for suggesting the use of a join so that the groupby only needs to be performed once
For example:
df = pd.DataFrame({'EOM':['1/31/2021','1/31/2022','1/31/2023','2/28/2021','2/28/2022','2/28/2023'],'Nike':[100,110,132,100,120,144],'Adidas':[80,80,88,90,90,99]})
df.sort_values(by='EOM', inplace=True)
df['EOM'] = pd.to_datetime(df['EOM'])
df['month'] = df['EOM'].dt.month
df = df.join(df.groupby("month")[['Nike','Adidas']].transform('pct_change')*100, rsuffix="_pct_change")
Result:
>>> df
EOM Nike Adidas month Nike_pct_change Adidas_pct_change
0 2021-01-31 100 80 1 NaN NaN
1 2022-01-31 110 80 1 10.0 0.0
2 2023-01-31 132 88 1 20.0 10.0
3 2021-02-28 100 90 2 NaN NaN
4 2022-02-28 120 90 2 20.0 0.0
5 2023-02-28 144 99 2 20.0 10.0
CodePudding user response:
Use a shift
method to calculate the YoY values.
Code:
import io
import pandas as pd
# Create sample data
s = '''EOM,Nike,Adidas
1/31/2021,100,80
1/31/2022,110,80
1/31/2023,132,88'''
df = pd.read_csv(io.StringIO(s))
# Convert EOM to datetime type
df['EOM'] = pd.to_datetime(df['EOM'])
# Add temporary columns for calculation
df['month'] = df['EOM'].dt.month
df['day'] = df['EOM'].dt.day
# Ensure df is sorted
df = df.sort_values(['day', 'month', 'EOM'])
# Set all the date-related columns as index
df = df.set_index(['EOM', 'month', 'day'])
# Shift a row forward by group
df_shift = df.groupby(['month', 'day']).shift()
# Calculate YoY
df = (df - df_shift).div(df_shift) * 100
# Drop the temporary columns
df = df.reset_index().drop(['month', 'day'], axis=1)
Output:
EOM | Nike | Adidas |
---|---|---|
2021-01-31 | nan | nan |
2022-01-31 | 10 | 0 |
2023-01-31 | 20 | 10 |