I'm trying to get the growth (in %) between two values at different period. Here is how my DataFrame looks like:
sessionSource dateRange activeUsers
0 instagram.com current 5
1 instagram.com previous 0
2 l.instagram.com current 83
3 l.instagram.com previous 11
4 snapchat.com current 2
5 snapchat.com previous 1
What I'm trying to get is:
sessionSource dateRange activeUsers Growth
0 instagram.com current 5 xx%
2 l.instagram.com current 83 xx%
4 snapchat.com current 2 xx%
I'm not a Pandas expert, I tried few things but nothing came close to what I need.
Thanks a lot for any help.
CodePudding user response:
Assuming you literally just need the percent change between current and previous and current/previous are in the correct order, you can just group the data based on the source and get the percent change of the group
.Use the pandas.Series.pct_change()
method on the grouped object and you should be good.
df['Growth']= (df.groupby('sessionSource')['activeUsers'].apply(pd.Series.pct_change))
For ex.(taken from the official documentation and applied on a series):
s = pd.Series([90, 91, 85])
s
0 90
1 91
2 85
dtype: int64
s.pct_change()
0 NaN
1 0.011111
2 -0.065934
dtype: float64
CodePudding user response:
You can use:
(df.sort_values(by=['sessionSource', 'dateRange'],
ascending=[True, False])
.groupby('sessionSource', as_index=False)
.agg({'dateRange': 'first', 'activeUsers': lambda s: s.pct_change().dropna().mul(100)})
)
Output:
sessionSource dateRange activeUsers
0 instagram.com previous inf
1 l.instagram.com previous 654.545455
2 snapchat.com previous 100.000000