Home > Software engineering >  Create DataFrame column with pairwise Last In First Out method as condition
Create DataFrame column with pairwise Last In First Out method as condition

Time:11-20

I have a DataFrame df1 with following columns: Date, Direction, Input, Output, and Amount.

df1
    Date        Direction   Input   Output  Amount
0   2022-01-02  In          18.5    0.0     1.0
1   2022-01-03  In          18.0    0.0     2.0
2   2022-01-04  Out         0.0     18.5    2.0
3   2022-01-05  In          16.0    0.0     1.0
4   2022-01-06  In          14.0    0.0     0.5
5   2022-01-07  Out         0.0     15.0    0.5
6   2022-01-08  Out         0.0     16.5    1.0
7   2022-01-09  Out         0.0     19.0    1.0
8   2022-01-10  In          13.0    0.0     0.9
9   2022-01-11  Out         0.0     15.0    0.9
10  2022-01-12  In          14.0    0.0     1.3
11  2022-01-13  In          12.0    0.0     1.4

I try to create an additional column; Difference that calculates the Last In First Out difference between the Input and Output. If there is an output (df1['Direction'] == 'Out') on a specific date, I try to look back and calculate the difference to the last input, which is not already used for another output. In addition, I try to control that the amount of input and output matches.

The decided output df2 would look like this:

    Date        Direction   Input   Output  Amount  Difference
0   2022-01-02  In          18.5    0.0     1.0     0.0
1   2022-01-03  In          18.0    0.0     2.0     0.0
2   2022-01-04  Out         0.0     18.5    2.0     0.5 <-- 18.5-18
3   2022-01-05  In          16.0    0.0     1.0     0.0
4   2022-01-06  In          14.0    0.0     0.5     0.0
5   2022-01-07  Out         0.0     15.0    0.5     1.0 <-- 15-14
6   2022-01-08  Out         0.0     16.5    1.0     0.5 <-- 16.5-16 (2022-01-05)
7   2022-01-09  Out         0.0     19.0    1.0     0.5 <-- 19-18.5 (2022-01-02)
8   2022-01-10  In          13.0    0.0     0.9     0.0
9   2022-01-11  Out         0.0     15.0    0.9     2.0 <-- 15-13
10  2022-01-12  In          14.0    0.0     1.3     0.0
11  2022-01-13  In          12.0    0.0     1.4     0.0

I was trying it with np.where() as condition and then substracting the shifted Input, but I don't know how to shift further to the previous Input if there are several Outputs after each other.

df1['Difference'] = np.where((df1['Direction'] == 'Out'), df1['Output']-df1['Input'].shift(1),0)

For reproducibility:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({
    'Date':['2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06', '2022-01-07', '2022-01-08', '2022-01-09', '2022-01-10', '2022-01-11', '2022-01-12', '2022-01-13'],
    'Direction':['In', 'In', 'Out', 'In', 'In', 'Out', 'Out', 'Out', 'In', 'Out', 'In', 'In'],
    'Input':[18.5, 18, 0, 16, 14, 0, 0, 0, 13, 0, 14, 12],
    'Output':[0, 0, 18.5, 0, 0, 15, 16.5, 19, 0, 15, 0, 0], 
    'Amount':[1, 2, 2, 1, 0.5, 0.5, 1, 1, 0.9, 0.9, 1.3, 1.4]})

Many thanks!

CodePudding user response:

The proper description for the problem should Last In First Out: the last unused In row is matched to each Out row.

You can solve this using a stack-based approach with deque:

from collections import deque

inputs = deque()
amount = []

for row in df1[["Direction", "Input", "Output"]].itertuples():
    if row.Direction == "In":
        inputs.append(row.Input)
        amount.append(0)
    else:
        # Calculate Amount based on the last In value and remove it
        amount.append(row.Output - inputs.pop())

df2 = df1.assign(Amount=amount)
  • Related