I have dataframe like this:
df = pd.DataFrame(np.random.randint(0,30,size=10),
columns=["Random"],
index=pd.date_range("20180101", periods=10))
df=df.reset_index()
df.loc[:,'Random'] = [31,11,21,23,43,42,51,47,49,22]
df['id']=[1,1,1,2,2,2,3,3,3,3]
df['diff']=[3,2,4,1,6,1,2,2,3,1]
df
What I want to create a new column 'Percentage' by using 'diff' column of same row and 'Random' column of previous row for each id? For instance the the value of 2nd row in column 'Percentage' would be (2/31)*100. For the first row(for each id) the column 'Percentage' will be NAN as we don't have the 'Random' column info. What is the way to do that? I can do if it's not for each id by using shift, but not sure how to use group by and shift at the same time(or something else).
CodePudding user response:
You can use GroupBy.shift
:
df['Percentage'] = df['diff'].div(df.groupby('id')['Random'].shift()).mul(100)
Output:
index Random id diff Percentage
0 2018-01-01 31 1 3 NaN
1 2018-01-02 11 1 2 6.451613
2 2018-01-03 21 1 4 36.363636
3 2018-01-04 23 2 1 NaN
4 2018-01-05 43 2 6 26.086957
5 2018-01-06 42 2 1 2.325581
6 2018-01-07 51 3 2 NaN
7 2018-01-08 47 3 2 3.921569
8 2018-01-09 49 3 3 6.382979
9 2018-01-10 22 3 1 2.040816