I have a dataset given with time stamps.
import pandas as pd
data = pd.DataFrame({'date': pd.to_datetime(['1992-01-01', '1992-02-01',
'1992-03-01', '1992-04-01',
'1992-05-01']),
'sales': [10, 20, 30, 40, 50],
'price': [4302, 4323, 4199, 4397, 4159]})
I am trying to differencing them with np.diff(data['price'])
for price
column. However, I want to have a reference point for the first row with timestamp, 1992-01-01
.
My reference value is 4100
and I expect to have dataset given below:
date, sales, diff_price
1992-01-01, 10, 4302-4100
1992-02-01, 20, 4323-4302
1992-03-01, 30, 4199-4323
1992-04-01, 40, 4397-4199
1992-05-01, 50, 4159-4397
Is there any easy way to do it without changing the structure of data in a pythonic way?
CodePudding user response:
We can use the prepend
parameter of np.diff
to set the reference value (4100) at the beginning of the Series:
reference_value = 4100
data['diff_price'] = np.diff(data['price'], prepend=reference_value)
or we can Series.shift
with a fill_value
of the reference (4100) and subtract:
reference_value = 4100
data['diff_price'] = (
data['price'] - data['price'].shift(fill_value=reference_value)
)
Either approach produces data
:
date sales price diff_price
0 1992-01-01 10 4302 202
1 1992-02-01 20 4323 21
2 1992-03-01 30 4199 -124
3 1992-04-01 40 4397 198
4 1992-05-01 50 4159 -238