Home > Net >  Pandas: Calculate running difference based on condition from another column
Pandas: Calculate running difference based on condition from another column

Time:11-09

I want to calculate the running difference of column ['Values'] based on a binary condition in another column ['Conditions']. If condition is 0 then it calculates the difference of the current row and preceding row. If condition is 1 then it calculates the difference of the current row and the previous row where the condition was also 1 like so:


  Values    Condition   Desired_Output
0   5000            1              NaN
1   5500            0            500.0
2   6700            1           1700.0
3   7100            0            400.0
4   8500            0           1400.0
5   9000            0            500.0
6   10500           1           3800.0
7   15750           0           5250.0
8   18000           1           7500.0
9   22250           0           4250.0
10  26000           0           3750.0
11  29750           0           3750.0
12  33500           0           3750.0
13  37250           0           3750.0
14  41000           1          23000.0
15  44750           0           3750.0
16  48500           1           7500.0
17  52250           1           3750.0
18  56000           0           3750.0
19  59750           1           7500.0
20  63500           0           3750.0
21  67250           0           3750.0
22  71000           0           3750.0
23  74750           0           3750.0
24  78500           0           3750.0
25  82250           1          22500.0
26  86000           0           3750.0
27  89750           1           7500.0

I tried using the groupby function with no such luck.

df.insert(2, 'Difference', (df.groupby('Condition')['Values'].diff()))

When I filter the dataframe based on the conditions and calculate the difference then I get close to the desired output however I have to work with two columns in that case. Is there a way to perform this function in a single column? I am fairly new to Python and would appreciate some help

CodePudding user response:

First calculate the difference between two successive rows for the whole column with diff, then replace the rows with condition being 1 by the difference on the Value column once selected only the rows with 1 in condition, by index alignment it should work.

m = df['Condition'].astype(bool)
df['res'] = df['Values'].diff()
df.loc[m, 'res'] = df.loc[m, 'Values'].diff()
print(df)
#     Values  Condition  Desired_Output      res
# 0     5000          1             NaN      NaN
# 1     5500          0           500.0    500.0
# 2     6700          1          1700.0   1700.0
# 3     7100          0           400.0    400.0
# 4     8500          0          1400.0   1400.0
# 5     9000          0           500.0    500.0
# 6    10500          1          3800.0   3800.0
# 7    15750          0          5250.0   5250.0
# 8    18000          1          7500.0   7500.0

if you want a one liner, you can do it using where, mask and ffill

df['res2'] = df['Values'].diff().mask(m, df['Values'].where(m.astype(bool)).ffill().diff())

CodePudding user response:

It sounds like you can calculate the differences for each line, and then override that with the differences in the Condition == 1 lines.

This would look like this:

Values = [5000, 5500, 6700, 7100, 8500, 9000, 10500, 15750, 18000, 22250, 26000]
Condition = [1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0]
df = pd.DataFrame(data={"Values":Values, "Condition":Condition})
df["Desired_Output"] = df.Values.diff()
df.loc[df.Condition == 1, "Desired_Output"] = df[df.Condition==1].Values.diff()
print(df)

The output is:

   Values  Condition  Desired_Output
0     5000          1             NaN
1     5500          0           500.0
2     6700          1          1700.0
3     7100          0           400.0
4     8500          0          1400.0
5     9000          0           500.0
6    10500          1          3800.0
7    15750          0          5250.0
8    18000          1          7500.0
9    22250          0          4250.0
10   26000          0          3750.0
  • Related