Rounding series of numbers to get as much "round" numbers as possible and sum up to 100-CodePudding

I have series of numbers that add up to nearly 100 (they are percentages), for example:

A	B	C
49.99	9.99	40.01

And I would like to adjust these to get something like:

A	B	C
50.00	10.00	40.00

Constraints :

The numbers count may vary,
The initial sum can be a little over/under 100 (I use a threshold of 0.1 : if the sum is below 99.9 or over 100.1 then I don't adjust),
sometimes there can be no "nice/round" numbers. In that case I want to distribute the missing/exceeding 1/100s in favor of the lowest numbers like so:

A	B	C	D	E	F	G
33.33	16.66	8.33	8.33	9.99	12.50	10.83
33.33	16.66	8.34	8.34	10.00	12.50	10.83

The function I currently use is working great for distributing the missing/exceeding 1.100s in a loop but does not address the 49.99 → 50.00 problem.

The principle it uses is :

multiply all the % by 100 (to get integers)
assign a "weight" to each integer based on the number of trailing zeroes (so that I adjust preferably the non-round numbers):
- 30 when (int mod 1000)=0
- 20 when (int mod 100)=0
- 10 when (int mod 10)=0
- 5 when (int mod 5) = 0
- else 0.
calculate the missing/exceeding 1/100s for the lowest weight
sort the ints to get the lowest ints first (the sort takes the weight into account)
add/substract 1 to each int until I get a sum of 100.

The resulting function will be written in Postgres' Pl/PgSql but I'm mostly interested in knowing if such algorithms exist and how they are named (of course, a link to a working function would be very much appreciated).

CodePudding user response：

I ended up splitting the problem in 2 :

adjust the original shares to get "nicer" numbers (49.99 → 50.00) and
add a few 1/100s here and there to obtain a 100% total.

The result is satisfying, I even integrated a few special values for common fractions (200/3, 100/3, 100/6, 100/12...) so that 3x33.33 don't end up with 33.35, 33.35 and 33.30, less "nice/round" but more fair. Note there is only one loop for the final adjustments. Performance is acceptable : 3.5 sec for 100000 rows.

The following SO question and the included Wikipedia article helped me understand the possible biases and their pros/cons.

Here is the code for those interested :

CREATE OR REPLACE FUNCTION public.normalizeshares_weight(share INT) RETURNS INT AS
$BODY$
    SELECT CASE 
        WHEN share % 10000 = 0 THEN 40
        WHEN share % 1000 = 0 THEN 30
        WHEN share % 100 = 0  THEN 20
        WHEN share % 50 = 0 OR (share = ANY('{6666,3333,1666,833,416}')) THEN 15
        WHEN (share % 10 = 0) THEN 10
        WHEN share % 5 = 0 THEN 5
        ELSE 0 END;
$BODY$ LANGUAGE SQL IMMUTABLE;


CREATE OR REPLACE FUNCTION public.normalizeshares(shares NUMERIC[]) RETURNS NUMERIC(5,2)[] AS
$BODY$
DECLARE
    intshares INT[];
    adjshares INT[];
    weight   INT[];
    result NUMERIC[];
    nb0 INT = 0;
    nb5 INT = 0;
    nb10 INT = 0;
    nb15 INT = 0;
    nb20 INT = 0;
    nb30 INT = 0;
    nb40 INT = 0;
    initot INT = 0;
    tot INT = 0;
    nb INT = 0;
    w INT = 0;
    diff INT;
    each INT;
    bestweight INT;
BEGIN
    FOR i IN 1..ARRAY_LENGTH(shares,1) LOOP
        intshares[i] := FLOOR(COALESCE(shares[i],0)*100);
        weight[i] := normalizeshares_weight(intshares[i]);
        bestweight := weight[i];
        adjshares[i] := intshares[i];
        IF normalizeshares_weight(intshares[i] 1) > bestweight THEN adjshares[i] := intshares[i] 1; bestweight := normalizeshares_weight(intshares[i] 1); END IF;
        IF normalizeshares_weight(intshares[i] 2) > bestweight THEN adjshares[i] := intshares[i] 2; bestweight := normalizeshares_weight(intshares[i] 2); END IF;
        IF normalizeshares_weight(intshares[i] 3) > bestweight THEN adjshares[i] := intshares[i] 2; bestweight := normalizeshares_weight(intshares[i] 3); END IF;
        IF normalizeshares_weight(intshares[i]-1) > bestweight THEN adjshares[i] := intshares[i]-1; bestweight := normalizeshares_weight(intshares[i]-1); END IF;
        IF normalizeshares_weight(intshares[i]-2) > bestweight THEN adjshares[i] := intshares[i]-2; bestweight := normalizeshares_weight(intshares[i]-2); END IF;
        IF normalizeshares_weight(intshares[i]-3) > bestweight THEN adjshares[i] := intshares[i]-2; bestweight := normalizeshares_weight(intshares[i]-3); END IF;
        tot := tot   adjshares[i];
        initot := initot   intshares[i];
        weight[i] := bestweight; -- normalizeshares_weight(adjshares[i]);   already calculated
        IF    weight[i]=0  THEN nb0  := nb0    1;
        ELSIF weight[i]=5  THEN nb5  := nb5    1;
        ELSIF weight[i]=10 THEN nb10 := nb10   1;
        ELSIF weight[i]=15 THEN nb15 := nb15   1;
        ELSIF weight[i]=20 THEN nb20 := nb20   1;
        ELSIF weight[i]=30 THEN nb30 := nb30   1;
        ELSIF weight[i]=40 THEN nb40 := nb40   1;
        END IF;
        result[i] := (intshares[i]::NUMERIC / 100)::NUMERIC(5,2);
    END LOOP;
    IF tot=10000 THEN
        -- RAISE NOTICE 'adjtot=100.00 : %',adjshares::numeric[];
        FOR i IN 1..ARRAY_LENGTH(shares,1) LOOP
            result[i] := (adjshares[i]::NUMERIC / 100)::NUMERIC(5,2);
        END LOOP;
    ELSIF (initot=10000) OR (ABS(10000-tot)>90) THEN
        -- RAISE NOTICE 'No adj needed, initot=%, tot=%',initot,tot;    
    ELSE
        IF    nb0  > 0 THEN nb := nb0;  w := 0;
        ELSIF nb5  > 0 THEN nb := nb5;  w := 5;
        ELSIF nb10 > 0 THEN nb := nb10; w := 10;
        ELSIF nb15 > 0 THEN nb := nb15; w := 15;
        ELSIF nb20 > 0 THEN nb := nb20; w := 20;
        ELSIF nb30 > 0 THEN nb := nb30; w := 30;
        ELSIF nb40 > 0 THEN nb := nb40; w := 40;
        END IF;
        diff := 10000 - tot;
        each := diff/nb diff/abs(diff);
        -- RAISE NOTICE 'nb=%, w=%, diff=%, tot=%, adj=%',nb,w,diff,tot,adjshares::numeric[];
        FOR i IN 1..ARRAY_LENGTH(shares,1) LOOP
            IF weight[i]=w THEN
                IF diff=0 THEN
                ELSIF nb=1 THEN
                    adjshares[i] := adjshares[i]   diff;
                ELSIF nb>1 THEN
                    adjshares[i] := adjshares[i]   each;
                    diff := diff - each;
                END IF;
                nb := nb -1;
            END IF;
            result[i] := (adjshares[i]::NUMERIC / 100)::NUMERIC(5,2);
        END LOOP;
    END IF;
    RETURN result;
END;
$BODY$ LANGUAGE plpgsql IMMUTABLE;

And a few results :

% select normalizeshares('{49.99,9.99,40.01}');
   normalizeshares   
---------------------
 {50.00,10.00,40.00}

% select normalizeshares('{33.33,16.66,8.33,8.33,9.99,12.5,10.83}');
              normalizeshares              
-------------------------------------------
 {33.33,16.66,8.33,8.33,10.00,12.50,10.85}