Home > Software engineering >  Create a new columns in dataframe equaling differenciated series
Create a new columns in dataframe equaling differenciated series

Time:11-30

I want to create a new column diff aqualing the differenciation of a series in a nother column.

The following is my dataframe:

df=pd.DataFrame({
    'series_1' : [10.1, 15.3, 16, 12, 14.5, 11.8, 2.3, 7.7,5,10],
    'series_2' : [9.6,10.4, 11.2, 3.3, 6, 4, 1.94, 15.44, 6.17, 8.16]
})

It has the following display:

series_1    series_2
0   10.1    9.60
1   15.3    10.40
2   16.0    11.20
3   12.0    3.30
4   14.5    6.00
5   11.8    4.00
6   2.3     1.94
7   7.7     15.44
8   5.0     6.17
9   10.0    8.16

Goal

Is to get the following output:

series_1    series_2    diff_2
0   10.1    9.60        NaN
1   15.3    10.40       0.80
2   16.0    11.20       0.80
3   12.0    3.30        -7.90
4   14.5    6.00        2.70
5   11.8    4.00        -2.00
6   2.3     1.94        -2.06
7   7.7     15.44       13.50
8   5.0     6.17        -9.27
9   10.0    8.16        1.99

My code

To reach the desired output I used the following code and it worked:

diff_2=[np.nan]

l=len(df)

for i in range(1, l):
    
    diff_2.append(df['series_2'][i] - df['series_2'][i-1])

df['diff_2'] = diff_2

Issue with my code

I replicated here a simplified dataframe, the real one I am working on is extremly large and my code took almost 9 minute runtime!

I want an alternative allowing me to get the output in a fast way,

Any suggestion from your side will be highly appreciated, thanks.

CodePudding user response:

here is one way to do it, using diff


# create a new col by taking difference b/w consecutive rows of DF using diff
df['diff_2']=df['series_2'].diff()
df
    series_1    series_2    diff_2
0   10.1    9.60    NaN
1   15.3    10.40   0.80
2   16.0    11.20   0.80
3   12.0    3.30    -7.90
4   14.5    6.00    2.70
5   11.8    4.00    -2.00
6   2.3     1.94    -2.06
7   7.7     15.44   13.50
8   5.0     6.17    -9.27
9   10.0    8.16    1.99

CodePudding user response:

You might want to add the following line of code:

df["diff_2"] = df["series_2"].sub(df["series_2"].shift(1))

to achieve your goal output:

   series_1  series_2  diff_2
0      10.1      9.60     NaN
1      15.3     10.40    0.80
2      16.0     11.20    0.80
3      12.0      3.30   -7.90
4      14.5      6.00    2.70
5      11.8      4.00   -2.00
6       2.3      1.94   -2.06
7       7.7     15.44   13.50
8       5.0      6.17   -9.27
9      10.0      8.16    1.99

That is a build-in pandas feature, so that should be optimized for good performance.

  • Related