Home > Blockchain >  Pandas aggregate two columns at max
Pandas aggregate two columns at max

Time:10-25

I have a data frame with two columns

df = DataFrame.from_records([
  {"time": 10, "amount": 200},
  {"time": 70, "amount": 1000},
  {"time": 10, "amount": 300},
  {"time": 10, "amount": 100},
])

I want to, given a period of time 80ms, calculate the max amount that is possible, in this case, the output should be 1300 because, in this period, the maximum amount possible is 1300.

Is it possible with Pandas? I thought about using aggregate, but I do not know how to use it

CodePudding user response:

This is a knapsack problem, you can solve it with a dedicated library (e.g., knapsack):

from knapsack import knapsack
total, idx = knapsack(df['time'], df['amount']).solve(80)

df.out = df.iloc[idx]

output:

   time  amount
1    70    1000
2    10     300

Other examples:

# with max = 75
   time  amount
1    70    1000

# with max = 40
   time  amount
0    10     200
2    10     300
3    10     100

CodePudding user response:

You could try to upsample you data to 10ms and then use a rolling window.

# set a time index to the dataframe
df2 = df.set_index(pd.to_timedelta(df['time'], unit='ms').cumsum())

it gives:

                        time  amount
time                                
0 days 00:00:00.010000    10     200
0 days 00:00:00.080000    70    1000
0 days 00:00:00.090000    10     300
0 days 00:00:00.100000    10     100

We can now upsample the amount, assuming a linear increase between consecutive timestamps:

amounts = (df2.amount / df2.time).resample('10ms').bfill()

giving:

time
0 days 00:00:00.010000    20.000000
0 days 00:00:00.020000    14.285714
0 days 00:00:00.030000    14.285714
0 days 00:00:00.040000    14.285714
0 days 00:00:00.050000    14.285714
0 days 00:00:00.060000    14.285714
0 days 00:00:00.070000    14.285714
0 days 00:00:00.080000    14.285714
0 days 00:00:00.090000    30.000000
0 days 00:00:00.100000    10.000000
Freq: 10L, dtype: float64

Using a rolling window, we can now find the amount per 80ms duration:

amounts.rolling('80ms').sum()

which gives:

time
2022-01-01 00:00:00.010     20.000000
2022-01-01 00:00:00.020     34.285714
2022-01-01 00:00:00.030     48.571429
2022-01-01 00:00:00.040     62.857143
2022-01-01 00:00:00.050     77.142857
2022-01-01 00:00:00.060     91.428571
2022-01-01 00:00:00.070    105.714286
2022-01-01 00:00:00.080    120.000000
2022-01-01 00:00:00.090    130.000000
2022-01-01 00:00:00.100    125.714286
Freq: 10L, dtype: float64

We can see that the maximum value is reached after 90 ms and is 130.

If you only want the max value:

amounts.rolling('80ms').sum().max()

giving directly:

130.0
  • Related