I have a dataframe with week number as int, item name, and ranking.
For instance:
item_name ranking week_number
0 test 4 1
1 test 3 2
I'd like to add a new column with the ranking evolution since the last week.
The math is very simple:
df['ranking_evolution'] = ranking_previous_week - df['ranking']
It would only require exception handling for week 1.
But I'm not sure how to return the ranking previous week. I could do it by iterating over the rows but I'm wondering if there is a cleaner way so I can just declare a column?
The issue is that I'd have to compare the dataframe to itself. I've candidly tried:
df['ranking_evolution'] = df['ranking'].loc[(df[item_name] == df['item_name]) & (df['week_number'] == df['week_number'] - 1) - df['ranking']
But this return NaN
values.
Even using a copy returned NaN
values.
CodePudding user response:
I assume this is a simplistic example, you probably have different products and maybe missing weeks?
A robust way would be to perform a self-merge with the week 1:
(df.merge(df.assign(week_number=df['week_number'] 1),
on=['item_name', 'week_number'],
suffixes=(None, '_evolution'),
how='left')
.assign(ranking_evolution=lambda d: d['ranking_evolution'].sub(d['ranking']))
)
Output:
item_name ranking week_number ranking_evolution
0 test 4 1 NaN
1 test 3 2 1.0
CodePudding user response:
Shortly, try this code to figure out the trick.
import pandas as pd
data = {
'item_name': ['test', 'test', 'test', 'test', 'test', 'test', 'test', 'test', 'test', 'test'],
'ranking': [4, 3, 2, 1, 2, 3, 4, 5, 6, 7],
'week_number': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
}
df = pd.DataFrame(data)
df['ranking_evolution'] = df['ranking'].diff(-1) # this is the line that does the trick
print(df)
Results
item_name ranking week_number ranking_evolution
test 4 1 1.0
test 3 2 1.0
test 2 3 1.0
test 1 4 -1.0