I have a dataframe in which I would like to calculate the diff() on a certain column (Value). The number of periods to be calculated is coming from another column (Periods).
Periods | Value | Diff |
---|---|---|
0 | 32 | 0 |
1 | 35 | 3 |
2 | 37 | 5 |
2 | 41 | 6 |
I tried:
df['Diff'] = df['Value'].diff(periods=df['Periods'])
=> this returns TypeError: cannot convert the series to <class 'int'>
On different posts, I saw that we need to use .astype(int):
df['Diff'] = df['Value'].diff(periods=df['Periods'].astype(int))
=> but that gives the same TypeError: cannot convert the series to <class 'int'>
I also tried .values:
df['Diff'] = df['Value'].diff(periods=df['Periods'].values)
=> but that returns the TypeError: only size-1 arrays can be converted to Python scalars
Does anyone know what I am missing here?
CodePudding user response:
df['Diff'] = df.Value - pd.Series(df.Value[df.index - df.Period].values, df.index)
CodePudding user response:
You can do this manually. reset_index
to ensure your DataFrame has a RangeIndex
. Then subtract the periods from the Index to figure out which row you need to pull the 'Value' from and map that with the original DataFrame. Now you can manually subtract to calculate the difference, which incorporates the variable 'Periods'.
df = df.reset_index(drop=True)
df['Value_shift'] = (df.index - df['Periods']).map(df['Value'])
df['Diff'] = df['Value'] - df['Value_shift']
print(df)
Periods Value Value_shift Diff
0 0 32 32 0
1 1 35 32 3
2 2 37 32 5
3 2 41 35 6
CodePudding user response:
An approach could be the following - not the most efficient, but does the job:
import pandas as pd
import numpy as np
df = pd.DataFrame({"Periods":[0,1,2,2], "Value":[32, 35, 37, 41]})
df["Diff"] = np.diag(df.apply(lambda x:df["Value"].diff(x["Periods"]), axis=1))
df
OUTPUT
Periods Value Diff
0 0 32 0.0
1 1 35 3.0
2 2 37 5.0
3 2 41 6.0