Home > Net >  how to use np.diff with reference point in python
how to use np.diff with reference point in python

Time:10-30

I have a dataset given with time stamps.

import pandas as pd
data = pd.DataFrame({'date': pd.to_datetime(['1992-01-01', '1992-02-01',
                                             '1992-03-01', '1992-04-01',
                                             '1992-05-01']),
                      'sales': [10, 20, 30, 40, 50],
                      'price': [4302, 4323, 4199, 4397, 4159]})

I am trying to differencing them with np.diff(data['price']) for price column. However, I want to have a reference point for the first row with timestamp, 1992-01-01. My reference value is 4100 and I expect to have dataset given below:

    date,   sales,   diff_price
1992-01-01,  10,      4302-4100 
1992-02-01,  20,      4323-4302
1992-03-01,  30,      4199-4323
1992-04-01,  40,      4397-4199
1992-05-01,  50,      4159-4397

Is there any easy way to do it without changing the structure of data in a pythonic way?

CodePudding user response:

We can use the prepend parameter of np.diff to set the reference value (4100) at the beginning of the Series:

reference_value = 4100
data['diff_price'] = np.diff(data['price'], prepend=reference_value)

or we can Series.shift with a fill_value of the reference (4100) and subtract:

reference_value = 4100
data['diff_price'] = (
        data['price'] - data['price'].shift(fill_value=reference_value)
)

Either approach produces data:

        date  sales  price  diff_price
0 1992-01-01     10   4302         202
1 1992-02-01     20   4323          21
2 1992-03-01     30   4199        -124
3 1992-04-01     40   4397         198
4 1992-05-01     50   4159        -238
  • Related