Home > OS >  Pandas calculate diff() using the value of another column as keyword argument for 'periods'
Pandas calculate diff() using the value of another column as keyword argument for 'periods'

Time:12-09

I have a dataframe in which I would like to calculate the diff() on a certain column (Value). The number of periods to be calculated is coming from another column (Periods).

Periods Value Diff
0 32 0
1 35 3
2 37 5
2 41 6

I tried:

df['Diff'] = df['Value'].diff(periods=df['Periods'])

=> this returns TypeError: cannot convert the series to <class 'int'>

On different posts, I saw that we need to use .astype(int):

df['Diff'] = df['Value'].diff(periods=df['Periods'].astype(int))

=> but that gives the same TypeError: cannot convert the series to <class 'int'>

I also tried .values:

df['Diff'] = df['Value'].diff(periods=df['Periods'].values)

=> but that returns the TypeError: only size-1 arrays can be converted to Python scalars

Does anyone know what I am missing here?

CodePudding user response:

df['Diff'] = df.Value - pd.Series(df.Value[df.index - df.Period].values, df.index)

CodePudding user response:

You can do this manually. reset_index to ensure your DataFrame has a RangeIndex. Then subtract the periods from the Index to figure out which row you need to pull the 'Value' from and map that with the original DataFrame. Now you can manually subtract to calculate the difference, which incorporates the variable 'Periods'.

df = df.reset_index(drop=True)

df['Value_shift'] = (df.index - df['Periods']).map(df['Value'])
df['Diff'] = df['Value'] - df['Value_shift']

print(df)
   Periods  Value  Value_shift  Diff
0        0     32           32     0
1        1     35           32     3
2        2     37           32     5
3        2     41           35     6

CodePudding user response:

An approach could be the following - not the most efficient, but does the job:

import pandas as pd
import numpy as np

df = pd.DataFrame({"Periods":[0,1,2,2], "Value":[32, 35, 37, 41]})

df["Diff"] = np.diag(df.apply(lambda x:df["Value"].diff(x["Periods"]), axis=1))

df

OUTPUT

   Periods  Value  Diff
0        0     32   0.0
1        1     35   3.0
2        2     37   5.0
3        2     41   6.0
  • Related