Home > database >  Compare a DataFrame to itself? pandas
Compare a DataFrame to itself? pandas

Time:08-31

I have a dataframe with week number as int, item name, and ranking.

For instance:

  item_name       ranking  week_number
0      test             4            1
1      test             3            2

I'd like to add a new column with the ranking evolution since the last week.

The math is very simple:

df['ranking_evolution'] = ranking_previous_week - df['ranking'] 

It would only require exception handling for week 1.

But I'm not sure how to return the ranking previous week. I could do it by iterating over the rows but I'm wondering if there is a cleaner way so I can just declare a column?

The issue is that I'd have to compare the dataframe to itself. I've candidly tried:

df['ranking_evolution'] = df['ranking'].loc[(df[item_name] == df['item_name]) & (df['week_number'] == df['week_number'] - 1) - df['ranking']

But this return NaN values. Even using a copy returned NaN values.

CodePudding user response:

I assume this is a simplistic example, you probably have different products and maybe missing weeks?

A robust way would be to perform a self-merge with the week 1:

(df.merge(df.assign(week_number=df['week_number'] 1),
          on=['item_name', 'week_number'],
          suffixes=(None, '_evolution'),
          how='left')
   .assign(ranking_evolution=lambda d: d['ranking_evolution'].sub(d['ranking']))
 )

Output:

  item_name  ranking  week_number  ranking_evolution
0      test        4            1                NaN
1      test        3            2                1.0

CodePudding user response:

Shortly, try this code to figure out the trick.

import pandas as pd

data = {
    'item_name':   ['test', 'test', 'test', 'test', 'test', 'test', 'test', 'test', 'test', 'test'],
    'ranking':     [4, 3, 2, 1, 2, 3, 4, 5, 6, 7],
    'week_number': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
}
df = pd.DataFrame(data)

df['ranking_evolution'] = df['ranking'].diff(-1) # this is the line that does the trick

print(df)

Results

  item_name  ranking  week_number  ranking_evolution
     test        4            1                1.0
     test        3            2                1.0
     test        2            3                1.0
     test        1            4               -1.0
  • Related