I have an example data frame
import pandas as pd
df = pd.DataFrame({'COTA':['A','A','A','A','A','B','B','B','B'],
'Date':['14/10/2021','19/10/2020','29/10/2019','30/09/2021','20/09/2020','20/10/2021','29/10/2020','15/10/2019','10/09/2020'],
'Mark':[1,2,3,4,5,1,2,3,3]
})
print(df)
For each "COTA" I have a date and a "mark" I need to create a new column with the highest value of the "COTA" of the previous year regardless of the month.
Example
COTA Date Mark Max_MarkLastYear
0 A 14/10/2021 1 5
1 A 19/10/2020 2 3
2 A 29/10/2019 3 NaN
3 A 30/09/2021 4 5
4 A 20/09/2020 5 3
5 B 20/10/2021 1 2
6 B 29/10/2020 2 3
7 B 15/10/2019 3 NaN
8 B 10/09/2020 3 3
in line 0 "COTA" = A, the "Mark" = 1 and the highest "Mark" of the previous year (2020)is 5
How can I create this new column
CodePudding user response:
If need match by previous years use:
df['Date'] = pd.to_datetime(df['Date'])
df['LastYear'] = df['Date'] - pd.offsets.YearEnd(0)
s1 = df.groupby(['Found', 'LastYear'])['Mark'].max()
s2 = s1.rename(index=lambda x: x pd.offsets.DateOffset(years=1), level=1)
df = df.join(s2.rename('Max_MarkLastYear'), on=['Found', 'LastYear'])
print (df)
Found Date Mark LastYear Max_MarkLastYear
0 A 2021-10-14 1 2021-12-31 5.0
1 A 2020-10-19 2 2020-12-31 3.0
2 A 2019-10-29 3 2019-12-31 NaN
3 A 2021-09-30 4 2021-12-31 5.0
4 A 2020-09-20 5 2020-12-31 3.0
5 B 2021-10-20 1 2021-12-31 3.0
6 B 2020-10-29 2 2020-12-31 3.0
7 B 2019-10-15 3 2019-12-31 NaN
8 B 2020-10-09 3 2020-12-31 3.0
If need match by previous month and year use:
df['Date'] = pd.to_datetime(df['Date'])
df['LastYear'] = pd.to_datetime(df['Date']).dt.to_period('m')
s1 = df.groupby(['Found', 'LastYear'])['Mark'].max()
s2 = s1.rename(index=lambda x: x 12, level=1)
df = df.join(s2.rename('Max_MarkLastYear'), on=['Found', 'LastYear'])
print (df)
Found Date Mark LastYear Max_MarkLastYear
0 A 2021-10-14 1 2021-10 2.0
1 A 2020-10-19 2 2020-10 3.0
2 A 2019-10-29 3 2019-10 NaN
3 A 2021-09-30 4 2021-09 5.0
4 A 2020-09-20 5 2020-09 NaN
5 B 2021-10-20 1 2021-10 3.0
6 B 2020-10-29 2 2020-10 3.0
7 B 2019-10-15 3 2019-10 NaN
8 B 2020-10-09 3 2020-10 3.0