Home > other >  Pandas rank values within groupby, starting a new rank if diff is greater than 1
Pandas rank values within groupby, starting a new rank if diff is greater than 1

Time:09-06

I have a sample dataframe as follows:

data={'Store':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2],
      'Week':[1,2,3,4,5,6,19,20,21,22,1,2,50,51,52,60,61,62,70,71,72,73]}
df=pd.DataFrame.from_dict(data)
df['WeekDiff'] = df.groupby('Store')['Week'].diff().fillna(1)

I added a difference column to find the gaps in the Week column within my data. I have been trying to groupby Store and somehow use the differences column to achieve the below output but with no success. I need the ranks to start from each occurence of a value greater than one until the next such value. Please see a sample output I'd like to achieve.

result_data={'Store':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2],
      'Week':[1,2,3,4,5,6,19,20,21,22,1,2,50,51,52,60,61,62,70,71,72,73],
      'Rank':[1,1,1,1,1,1,2,2,2,2,1,1,2,2,2,3,3,3,4,4,4,4]}

I am new to python and pandas and I've been trying to google this all day, but couldn't find a solution. Could you please help me how to do this?

Thank you in advance!

CodePudding user response:

You could try as follows:

import pandas as pd

data={'Store':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2],
      'Week':[1,2,3,4,5,6,19,20,21,22,1,2,50,51,52,60,61,62,70,71,72,73]}

df = pd.DataFrame(data)

df['Rank'] = df.groupby('Store')['Week'].diff()>1
df['Rank'] = df.groupby('Store')['Rank'].cumsum().add(1)

# check with expected output:
result_data={'Store':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2],
      'Week':[1,2,3,4,5,6,19,20,21,22,1,2,50,51,52,60,61,62,70,71,72,73],
      'Rank':[1,1,1,1,1,1,2,2,2,2,1,1,2,2,2,3,3,3,4,4,4,4]}

result_df = pd.DataFrame(result_data)

df.equals(result_df)
# True

Or as a (lengthy) one-liner:

df['Rank'] = df.set_index('Store').groupby(level=0)\
    .agg(Rank=('Week','diff')).gt(1).groupby(level=0)\
        .cumsum().add(1).reset_index(drop=True)
  • Related