I have a DataFrame df1
with following columns: Date
, Direction
, Input
, Output
, and Amount
.
df1
Date Direction Input Output Amount
0 2022-01-02 In 18.5 0.0 1.0
1 2022-01-03 In 18.0 0.0 2.0
2 2022-01-04 Out 0.0 18.5 2.0
3 2022-01-05 In 16.0 0.0 1.0
4 2022-01-06 In 14.0 0.0 0.5
5 2022-01-07 Out 0.0 15.0 0.5
6 2022-01-08 Out 0.0 16.5 1.0
7 2022-01-09 Out 0.0 19.0 1.0
8 2022-01-10 In 13.0 0.0 0.9
9 2022-01-11 Out 0.0 15.0 0.9
10 2022-01-12 In 14.0 0.0 1.3
11 2022-01-13 In 12.0 0.0 1.4
I try to create an additional column; Difference
that calculates the Last In First Out difference between the Input
and Output
. If there is an output (df1['Direction'] == 'Out') on a specific date, I try to look back and calculate the difference to the last input, which is not already used for another output. In addition, I try to control that the amount of input and output matches.
The decided output df2
would look like this:
Date Direction Input Output Amount Difference
0 2022-01-02 In 18.5 0.0 1.0 0.0
1 2022-01-03 In 18.0 0.0 2.0 0.0
2 2022-01-04 Out 0.0 18.5 2.0 0.5 <-- 18.5-18
3 2022-01-05 In 16.0 0.0 1.0 0.0
4 2022-01-06 In 14.0 0.0 0.5 0.0
5 2022-01-07 Out 0.0 15.0 0.5 1.0 <-- 15-14
6 2022-01-08 Out 0.0 16.5 1.0 0.5 <-- 16.5-16 (2022-01-05)
7 2022-01-09 Out 0.0 19.0 1.0 0.5 <-- 19-18.5 (2022-01-02)
8 2022-01-10 In 13.0 0.0 0.9 0.0
9 2022-01-11 Out 0.0 15.0 0.9 2.0 <-- 15-13
10 2022-01-12 In 14.0 0.0 1.3 0.0
11 2022-01-13 In 12.0 0.0 1.4 0.0
I was trying it with np.where()
as condition and then substracting the shifted Input
, but I don't know how to shift further to the previous Input if there are several Outputs
after each other.
df1['Difference'] = np.where((df1['Direction'] == 'Out'), df1['Output']-df1['Input'].shift(1),0)
For reproducibility:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'Date':['2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06', '2022-01-07', '2022-01-08', '2022-01-09', '2022-01-10', '2022-01-11', '2022-01-12', '2022-01-13'],
'Direction':['In', 'In', 'Out', 'In', 'In', 'Out', 'Out', 'Out', 'In', 'Out', 'In', 'In'],
'Input':[18.5, 18, 0, 16, 14, 0, 0, 0, 13, 0, 14, 12],
'Output':[0, 0, 18.5, 0, 0, 15, 16.5, 19, 0, 15, 0, 0],
'Amount':[1, 2, 2, 1, 0.5, 0.5, 1, 1, 0.9, 0.9, 1.3, 1.4]})
Many thanks!
CodePudding user response:
The proper description for the problem should Last In First Out: the last unused In
row is matched to each Out
row.
You can solve this using a stack-based approach with deque
:
from collections import deque
inputs = deque()
amount = []
for row in df1[["Direction", "Input", "Output"]].itertuples():
if row.Direction == "In":
inputs.append(row.Input)
amount.append(0)
else:
# Calculate Amount based on the last In value and remove it
amount.append(row.Output - inputs.pop())
df2 = df1.assign(Amount=amount)