I have a sample dataframe as follows:
data={'Store':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2],
'Week':[1,2,3,4,5,6,19,20,21,22,1,2,50,51,52,60,61,62,70,71,72,73]}
df=pd.DataFrame.from_dict(data)
df['WeekDiff'] = df.groupby('Store')['Week'].diff().fillna(1)
I added a difference column to find the gaps in the Week column within my data. I have been trying to groupby Store and somehow use the differences column to achieve the below output but with no success. I need the ranks to start from each occurence of a value greater than one until the next such value. Please see a sample output I'd like to achieve.
result_data={'Store':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2],
'Week':[1,2,3,4,5,6,19,20,21,22,1,2,50,51,52,60,61,62,70,71,72,73],
'Rank':[1,1,1,1,1,1,2,2,2,2,1,1,2,2,2,3,3,3,4,4,4,4]}
I am new to python and pandas and I've been trying to google this all day, but couldn't find a solution. Could you please help me how to do this?
Thank you in advance!
CodePudding user response:
You could try as follows:
import pandas as pd
data={'Store':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2],
'Week':[1,2,3,4,5,6,19,20,21,22,1,2,50,51,52,60,61,62,70,71,72,73]}
df = pd.DataFrame(data)
df['Rank'] = df.groupby('Store')['Week'].diff()>1
df['Rank'] = df.groupby('Store')['Rank'].cumsum().add(1)
# check with expected output:
result_data={'Store':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2],
'Week':[1,2,3,4,5,6,19,20,21,22,1,2,50,51,52,60,61,62,70,71,72,73],
'Rank':[1,1,1,1,1,1,2,2,2,2,1,1,2,2,2,3,3,3,4,4,4,4]}
result_df = pd.DataFrame(result_data)
df.equals(result_df)
# True
Or as a (lengthy) one-liner:
df['Rank'] = df.set_index('Store').groupby(level=0)\
.agg(Rank=('Week','diff')).gt(1).groupby(level=0)\
.cumsum().add(1).reset_index(drop=True)