Home > front end >  How to re-scale my data while keeping the same distribution
How to re-scale my data while keeping the same distribution

Time:10-26

   month  value
0      1    866
1      2    274
2      3    975
3      4    792
4      5    512
5      6    610
6      7    980
7      8    984
8      9    602
9     10    177

Say I have this data, I am wondering whether it is possible to distribute the data in like 6 months while keeping the same distribution.

Hope I'm not unclear and excuse me if I got the terminology wrong. Thank you.

CodePudding user response:

There's no way to know if your value is concentrated at one end of the month or the other. The simplest thing then is to assume it's evenly distributed over the month. So each output month will consist of 10/6 of the input months. For example output month 1 will consist of 6/6 of input month 1 and 4/6 of input month 2; output month 2 will be the remainder of input month 2 (2/6), 6/6 of input month 3, and 2/6 of input month 4. The code is actually simpler than the explanation:

values = [866, 274, 975, 792, 512, 610, 980, 984, 602, 177]
output = []
fill_count = 0
new_value = 0
in_len = len(values)
out_len = 6
for value in values:
    portion = out_len
    while portion > 0:
        using = min(portion, in_len - fill_count)
        new_value  = using * value / out_len
        portion -= using
        fill_count  = using
        if fill_count >= in_len:
            output.append(new_value)
            new_value = 0
            fill_count = 0
print(output)
[1048.6666666666667, 1330.3333333333333, 1040.0, 1263.3333333333335, 1511.3333333333335, 578.3333333333333]
print(math.isclose(sum(values), sum(output)))
True
  • Related