Home > Enterprise >  Pandas Monthly YoY?
Pandas Monthly YoY?

Time:02-13

Suppose I have a DF below:

EOM Nike Adidas
1/31/2021 100 80
1/31/2022 110 80
1/31/2023 132 88
... ... ...

How do I get:

EOM Nike Adidas
1/31/2021 null null
1/31/2022 10% 0
1/31/2023 20% 10%
... ... ...

CodePudding user response:

It would be preferable for you to provide a bit more information about the desired output, but it looks like you want a percent change between the end of months for consecutive years.

You can sort the EOM column if it isn't already sorted, then perform a groupby on the month and calculate the percent change. Thanks to @mozway for suggesting the use of a join so that the groupby only needs to be performed once

For example:

df = pd.DataFrame({'EOM':['1/31/2021','1/31/2022','1/31/2023','2/28/2021','2/28/2022','2/28/2023'],'Nike':[100,110,132,100,120,144],'Adidas':[80,80,88,90,90,99]})
df.sort_values(by='EOM', inplace=True)
df['EOM'] = pd.to_datetime(df['EOM'])
df['month'] = df['EOM'].dt.month

df = df.join(df.groupby("month")[['Nike','Adidas']].transform('pct_change')*100, rsuffix="_pct_change")

Result:

>>> df
         EOM  Nike  Adidas  month  Nike_pct_change  Adidas_pct_change
0 2021-01-31   100      80      1              NaN                NaN
1 2022-01-31   110      80      1             10.0                0.0
2 2023-01-31   132      88      1             20.0               10.0
3 2021-02-28   100      90      2              NaN                NaN
4 2022-02-28   120      90      2             20.0                0.0
5 2023-02-28   144      99      2             20.0               10.0

CodePudding user response:

Use a shift method to calculate the YoY values.

Code:

import io
import pandas as pd

# Create sample data
s = '''EOM,Nike,Adidas
1/31/2021,100,80
1/31/2022,110,80
1/31/2023,132,88'''
df = pd.read_csv(io.StringIO(s))

# Convert EOM to datetime type
df['EOM'] = pd.to_datetime(df['EOM'])

# Add temporary columns for calculation
df['month'] = df['EOM'].dt.month
df['day'] = df['EOM'].dt.day

# Ensure df is sorted
df = df.sort_values(['day', 'month', 'EOM'])

# Set all the date-related columns as index
df = df.set_index(['EOM', 'month', 'day'])

# Shift a row forward by group
df_shift = df.groupby(['month', 'day']).shift()

# Calculate YoY
df = (df - df_shift).div(df_shift) * 100

# Drop the temporary columns
df = df.reset_index().drop(['month', 'day'], axis=1)

Output:

EOM Nike Adidas
2021-01-31 nan nan
2022-01-31 10 0
2023-01-31 20 10
  • Related