Home > other >  Create a new column by using one column of same row and previous row of other column for each id
Create a new column by using one column of same row and previous row of other column for each id

Time:07-06

I have dataframe like this:

df = pd.DataFrame(np.random.randint(0,30,size=10),
                 columns=["Random"],
                 index=pd.date_range("20180101", periods=10))
df=df.reset_index()
df.loc[:,'Random'] = [31,11,21,23,43,42,51,47,49,22]
df['id']=[1,1,1,2,2,2,3,3,3,3]
df['diff']=[3,2,4,1,6,1,2,2,3,1]
df

What I want to create a new column 'Percentage' by using 'diff' column of same row and 'Random' column of previous row for each id? For instance the the value of 2nd row in column 'Percentage' would be (2/31)*100. For the first row(for each id) the column 'Percentage' will be NAN as we don't have the 'Random' column info. What is the way to do that? I can do if it's not for each id by using shift, but not sure how to use group by and shift at the same time(or something else).

CodePudding user response:

You can use GroupBy.shift:

df['Percentage'] = df['diff'].div(df.groupby('id')['Random'].shift()).mul(100)

Output:

       index  Random  id  diff  Percentage
0 2018-01-01      31   1     3         NaN
1 2018-01-02      11   1     2    6.451613
2 2018-01-03      21   1     4   36.363636
3 2018-01-04      23   2     1         NaN
4 2018-01-05      43   2     6   26.086957
5 2018-01-06      42   2     1    2.325581
6 2018-01-07      51   3     2         NaN
7 2018-01-08      47   3     2    3.921569
8 2018-01-09      49   3     3    6.382979
9 2018-01-10      22   3     1    2.040816
  • Related