Home > Blockchain >  Rounding series of numbers to get as much "round" numbers as possible and sum up to 100
Rounding series of numbers to get as much "round" numbers as possible and sum up to 100

Time:09-21

I have series of numbers that add up to nearly 100 (they are percentages), for example:

A B C
49.99 9.99 40.01

And I would like to adjust these to get something like:

A B C
50.00 10.00 40.00

Constraints :

  • The numbers count may vary,
  • The initial sum can be a little over/under 100 (I use a threshold of 0.1 : if the sum is below 99.9 or over 100.1 then I don't adjust),
  • sometimes there can be no "nice/round" numbers. In that case I want to distribute the missing/exceeding 1/100s in favor of the lowest numbers like so:
A B C D E F G
33.33 16.66 8.33 8.33 9.99 12.50 10.83
33.33 16.66 8.34 8.34 10.00 12.50 10.83

The function I currently use is working great for distributing the missing/exceeding 1.100s in a loop but does not address the 49.99 → 50.00 problem.

The principle it uses is :

  • multiply all the % by 100 (to get integers)
  • assign a "weight" to each integer based on the number of trailing zeroes (so that I adjust preferably the non-round numbers):
    • 30 when (int mod 1000)=0
    • 20 when (int mod 100)=0
    • 10 when (int mod 10)=0
    • 5 when (int mod 5) = 0
    • else 0.
  • calculate the missing/exceeding 1/100s for the lowest weight
  • sort the ints to get the lowest ints first (the sort takes the weight into account)
  • add/substract 1 to each int until I get a sum of 100.

The resulting function will be written in Postgres' Pl/PgSql but I'm mostly interested in knowing if such algorithms exist and how they are named (of course, a link to a working function would be very much appreciated).

CodePudding user response:

I ended up splitting the problem in 2 :

  1. adjust the original shares to get "nicer" numbers (49.99 → 50.00) and
  2. add a few 1/100s here and there to obtain a 100% total.

The result is satisfying, I even integrated a few special values for common fractions (200/3, 100/3, 100/6, 100/12...) so that 3x33.33 don't end up with 33.35, 33.35 and 33.30, less "nice/round" but more fair. Note there is only one loop for the final adjustments. Performance is acceptable : 3.5 sec for 100000 rows.

The following SO question and the included Wikipedia article helped me understand the possible biases and their pros/cons.

Here is the code for those interested :

CREATE OR REPLACE FUNCTION public.normalizeshares_weight(share INT) RETURNS INT AS
$BODY$
    SELECT CASE 
        WHEN share % 10000 = 0 THEN 40
        WHEN share % 1000 = 0 THEN 30
        WHEN share % 100 = 0  THEN 20
        WHEN share % 50 = 0 OR (share = ANY('{6666,3333,1666,833,416}')) THEN 15
        WHEN (share % 10 = 0) THEN 10
        WHEN share % 5 = 0 THEN 5
        ELSE 0 END;
$BODY$ LANGUAGE SQL IMMUTABLE;


CREATE OR REPLACE FUNCTION public.normalizeshares(shares NUMERIC[]) RETURNS NUMERIC(5,2)[] AS
$BODY$
DECLARE
    intshares INT[];
    adjshares INT[];
    weight   INT[];
    result NUMERIC[];
    nb0 INT = 0;
    nb5 INT = 0;
    nb10 INT = 0;
    nb15 INT = 0;
    nb20 INT = 0;
    nb30 INT = 0;
    nb40 INT = 0;
    initot INT = 0;
    tot INT = 0;
    nb INT = 0;
    w INT = 0;
    diff INT;
    each INT;
    bestweight INT;
BEGIN
    FOR i IN 1..ARRAY_LENGTH(shares,1) LOOP
        intshares[i] := FLOOR(COALESCE(shares[i],0)*100);
        weight[i] := normalizeshares_weight(intshares[i]);
        bestweight := weight[i];
        adjshares[i] := intshares[i];
        IF normalizeshares_weight(intshares[i] 1) > bestweight THEN adjshares[i] := intshares[i] 1; bestweight := normalizeshares_weight(intshares[i] 1); END IF;
        IF normalizeshares_weight(intshares[i] 2) > bestweight THEN adjshares[i] := intshares[i] 2; bestweight := normalizeshares_weight(intshares[i] 2); END IF;
        IF normalizeshares_weight(intshares[i] 3) > bestweight THEN adjshares[i] := intshares[i] 2; bestweight := normalizeshares_weight(intshares[i] 3); END IF;
        IF normalizeshares_weight(intshares[i]-1) > bestweight THEN adjshares[i] := intshares[i]-1; bestweight := normalizeshares_weight(intshares[i]-1); END IF;
        IF normalizeshares_weight(intshares[i]-2) > bestweight THEN adjshares[i] := intshares[i]-2; bestweight := normalizeshares_weight(intshares[i]-2); END IF;
        IF normalizeshares_weight(intshares[i]-3) > bestweight THEN adjshares[i] := intshares[i]-2; bestweight := normalizeshares_weight(intshares[i]-3); END IF;
        tot := tot   adjshares[i];
        initot := initot   intshares[i];
        weight[i] := bestweight; -- normalizeshares_weight(adjshares[i]);   already calculated
        IF    weight[i]=0  THEN nb0  := nb0    1;
        ELSIF weight[i]=5  THEN nb5  := nb5    1;
        ELSIF weight[i]=10 THEN nb10 := nb10   1;
        ELSIF weight[i]=15 THEN nb15 := nb15   1;
        ELSIF weight[i]=20 THEN nb20 := nb20   1;
        ELSIF weight[i]=30 THEN nb30 := nb30   1;
        ELSIF weight[i]=40 THEN nb40 := nb40   1;
        END IF;
        result[i] := (intshares[i]::NUMERIC / 100)::NUMERIC(5,2);
    END LOOP;
    IF tot=10000 THEN
        -- RAISE NOTICE 'adjtot=100.00 : %',adjshares::numeric[];
        FOR i IN 1..ARRAY_LENGTH(shares,1) LOOP
            result[i] := (adjshares[i]::NUMERIC / 100)::NUMERIC(5,2);
        END LOOP;
    ELSIF (initot=10000) OR (ABS(10000-tot)>90) THEN
        -- RAISE NOTICE 'No adj needed, initot=%, tot=%',initot,tot;    
    ELSE
        IF    nb0  > 0 THEN nb := nb0;  w := 0;
        ELSIF nb5  > 0 THEN nb := nb5;  w := 5;
        ELSIF nb10 > 0 THEN nb := nb10; w := 10;
        ELSIF nb15 > 0 THEN nb := nb15; w := 15;
        ELSIF nb20 > 0 THEN nb := nb20; w := 20;
        ELSIF nb30 > 0 THEN nb := nb30; w := 30;
        ELSIF nb40 > 0 THEN nb := nb40; w := 40;
        END IF;
        diff := 10000 - tot;
        each := diff/nb diff/abs(diff);
        -- RAISE NOTICE 'nb=%, w=%, diff=%, tot=%, adj=%',nb,w,diff,tot,adjshares::numeric[];
        FOR i IN 1..ARRAY_LENGTH(shares,1) LOOP
            IF weight[i]=w THEN
                IF diff=0 THEN
                ELSIF nb=1 THEN
                    adjshares[i] := adjshares[i]   diff;
                ELSIF nb>1 THEN
                    adjshares[i] := adjshares[i]   each;
                    diff := diff - each;
                END IF;
                nb := nb -1;
            END IF;
            result[i] := (adjshares[i]::NUMERIC / 100)::NUMERIC(5,2);
        END LOOP;
    END IF;
    RETURN result;
END;
$BODY$ LANGUAGE plpgsql IMMUTABLE;

And a few results :

% select normalizeshares('{49.99,9.99,40.01}');
   normalizeshares   
---------------------
 {50.00,10.00,40.00}

% select normalizeshares('{33.33,16.66,8.33,8.33,9.99,12.5,10.83}');
              normalizeshares              
-------------------------------------------
 {33.33,16.66,8.33,8.33,10.00,12.50,10.85}
  • Related