Portfolio optimization problem: improving the O(n!) solution-CodePudding

There's a set of subscription products, each product has the following properties:

daily return rate
minimum allocatable amount
maximum allocatable amount

I'm trying to allocate the given amount to get the highest possible total daily return.

My current solution is a brute force recursive greedy algorithm with O(n!) complexity. I'm looking for at least a polynomial solution as running this against production data takes ages. I've tried to apply Dynamic Programming techniques but there is a non-integer value that changes at every step (the amount is real, after every allocation it becomes lower).

The O(n!) solution in Python (see the _allocator function):

from pprint import pprint
from decimal import Decimal as D
from dataclasses import dataclass

_RATE = D(1) / 365

@dataclass
class Product:
    name: str
    min: D
    max: D
    annual_rate: D # Rate of return (annualized)
    daily_returns: D = None

    def __post_init__(self):
        self.daily_returns = (1   self.annual_rate) ** _RATE - 1

@dataclass
class Allocation:
    product: Product
    amount: D
    daily_gain: D = None

    def __post_init__(self):
        self.daily_gain = self.product.daily_returns * self.amount

PRODUCTS = [
    Product('Bad', 1, 100, D('0.01')),
    Product('Flexible', 1, 100, D('0.02')),
    Product('Locked60', 20, 30, D('0.04')),
    Product('Locked90', 20, 35, D('0.05')),
    Product('Staking7', 1, 10, D('0.07')),
]

# Try all possible allocations recursively and pick the best one
# Complexity: O(n!)
def _allocator(amount, products, path):
    bgain, balloc = D(0), path

    for i, p in enumerate(products):
        if p.min <= amount:
            val = min(p.max, amount)
            sgain, salloc = _allocator(amount - val,
                                       products[:i]   products[i 1:],
                                       [*path, (p, val)])

            sgain  = val * p.daily_returns
            if sgain > bgain:
                bgain = sgain
                balloc = salloc

    return bgain, balloc

def balance_brute(amount, products):
    _, alloc = _allocator(amount, products, [])
    return [Allocation(p, a) for p, a in alloc]

allocs = balance_brute(D(100), PRODUCTS)
pprint(allocs)
print('Total gain:', sum(a.daily_gain for a in allocs))

CodePudding user response：

An integer program solver makes quick work of this problem. I used OR-Tools and specifically the CP-SAT backend so that the arithmetic is exact. If the minimum investable unit is not 1, scale appropriately.

from ortools.linear_solver import pywraplp


def balance(amount, products):
    solver = pywraplp.Solver.CreateSolver("SAT_INTEGER_PROGRAMMING")
    infinity = solver.infinity()
    amounts = []
    daily_returns = 0
    for product in products:
        indicator = solver.BoolVar("")
        x = solver.IntVar(0, product.max, "")
        solver.Add(int(product.min) * indicator <= x)
        solver.Add(x <= int(product.max) * indicator)
        amounts.append(x)
        daily_returns  = float(product.daily_returns) * x
    solver.Add(sum(amounts) <= int(amount))
    solver.Maximize(daily_returns)
    status = solver.Solve()
    if status == pywraplp.Solver.OPTIMAL:
        print("Objective value =", solver.Objective().Value())
        return [
            Allocation(product, D(x.solution_value()))
            for (product, x) in zip(products, amounts)
        ]


print(balance(D(100), PRODUCTS))

CodePudding user response：

I would try a Mixed-Integer Programming formulation first.

Advanced solvers have so-called semi-continuous variables, which allow a variable to assume values 0 or between [L,U]. That would give a model like:

max sum(i, x[i]*rate[i])
sum(i, x[i]) <= budget
δ[i] ∈ {0} ∪ [L[i],U[i]]

If your solver does not have semi-continuous variables, then use binary variables:

max sum(i, x[i]*rate[i])
sum(i, x[i]) <= budget
δ[i]*L[i] <= x[i] <= δ[i]*U[i];
δ[i] ∈ {0,1}

Not polynomial at all. But very fast: 2.2 seconds for 100k items (using random data). This is a good example where complexity analysis can be misleading: exponential worst-case behavior does not mean slow on a particular problem. This obvious statement is almost never taught it seems.

(I said earlier 0.05 seconds; that was not correct -- that was for an n=1,000 run). Correct time is:

Root node processing (before b&c):
  Real time             =    2.20 sec. (2328.41 ticks)
Parallel b&c, 16 threads:
  Real time             =    0.00 sec. (0.00 ticks)
  Sync time (average)   =    0.00 sec.
  Wait time (average)   =    0.00 sec.
                          ------------
Total (root branch&cut) =    2.20 sec. (2328.41 ticks)