I have a data frame with two columns
df = DataFrame.from_records([
{"time": 10, "amount": 200},
{"time": 70, "amount": 1000},
{"time": 10, "amount": 300},
{"time": 10, "amount": 100},
])
I want to, given a period of time 80
ms, calculate the max amount that is possible, in this case, the output should be 1300 because, in this period, the maximum amount possible is 1300.
Is it possible with Pandas? I thought about using aggregate, but I do not know how to use it
CodePudding user response:
This is a knapsack problem, you can solve it with a dedicated library (e.g., knapsack
):
from knapsack import knapsack
total, idx = knapsack(df['time'], df['amount']).solve(80)
df.out = df.iloc[idx]
output:
time amount
1 70 1000
2 10 300
Other examples:
# with max = 75
time amount
1 70 1000
# with max = 40
time amount
0 10 200
2 10 300
3 10 100
CodePudding user response:
You could try to upsample you data to 10ms and then use a rolling window.
# set a time index to the dataframe
df2 = df.set_index(pd.to_timedelta(df['time'], unit='ms').cumsum())
it gives:
time amount
time
0 days 00:00:00.010000 10 200
0 days 00:00:00.080000 70 1000
0 days 00:00:00.090000 10 300
0 days 00:00:00.100000 10 100
We can now upsample the amount, assuming a linear increase between consecutive timestamps:
amounts = (df2.amount / df2.time).resample('10ms').bfill()
giving:
time
0 days 00:00:00.010000 20.000000
0 days 00:00:00.020000 14.285714
0 days 00:00:00.030000 14.285714
0 days 00:00:00.040000 14.285714
0 days 00:00:00.050000 14.285714
0 days 00:00:00.060000 14.285714
0 days 00:00:00.070000 14.285714
0 days 00:00:00.080000 14.285714
0 days 00:00:00.090000 30.000000
0 days 00:00:00.100000 10.000000
Freq: 10L, dtype: float64
Using a rolling window, we can now find the amount per 80ms duration:
amounts.rolling('80ms').sum()
which gives:
time
2022-01-01 00:00:00.010 20.000000
2022-01-01 00:00:00.020 34.285714
2022-01-01 00:00:00.030 48.571429
2022-01-01 00:00:00.040 62.857143
2022-01-01 00:00:00.050 77.142857
2022-01-01 00:00:00.060 91.428571
2022-01-01 00:00:00.070 105.714286
2022-01-01 00:00:00.080 120.000000
2022-01-01 00:00:00.090 130.000000
2022-01-01 00:00:00.100 125.714286
Freq: 10L, dtype: float64
We can see that the maximum value is reached after 90 ms and is 130.
If you only want the max value:
amounts.rolling('80ms').sum().max()
giving directly:
130.0