Home > Blockchain >  Find the mid point where the sum of upper half and lower half of data.table is almost equal
Find the mid point where the sum of upper half and lower half of data.table is almost equal

Time:09-16

I have a data.table in which I want to find the column ID at which the sum of all rows above and below that ID is equal or almost equal.

dt = x <- structure(list(value = c(7.496, 11.073, 11.329, 9.282, 8.748, 12.515, 7.46, 9.189, 9.62, 5.815, 5.945, 
                                   7.778, 10.077, 15.311, 8.591, 6.048, 7.568, 6.14, 6.591, 5.376, 
                                   8.038, 7.496, 7.983, 6.591, 6.591, 7.44, 6.453, 11.589, 5.751, 
                                   8.464, 7.577, 6.014, 12.733, 7.108, 14.857, 15.503, 12.468, 13.39, 
                                   10.796, 10.923, 7.215, 13.72, 7.574, 11.77, 10.409, 7.591, 6.174, 
                                   6.748, 10.091, 9.8, 6.527, 9.251, 6.622, 13.742, 4.454, 8.331, 
                                   7.702, 7.197, 9.629, 9.76, 3.663, 19.55, 8.107, 9.637, 10.146, 
                                   9.564, 6.947, 14.45, 10.266, 5.457, 10.629, 6.275, 2.48, 4.513, 
                                   6.755, 2.885, 5.773, 2.855, 2.429, 2.955, 2.486, 3.239, 4.29, 
                                   3.043, 3.501, 3.276, 4.018, 2.727, 5.199, 2.371, 3.732, 2.533, 
                                   4.482, 3.215, 7.782, 3.435, 4.201, 3.074, 3.475, 2.923, 3.025, 
                                   4.308, 3.932, 2.923, 3.491, 2.852, 3.916), ID = 1:107), row.names = c(NA, 
                                                                                                         -107L), class = "data.frame")

> dt
     value  ID
1    7.496   1
2   11.073   2
3   11.329   3
4    9.282   4
5    8.748   5
6   12.515   6
7    7.460   7
8    9.189   8
9    9.620   9
10   5.815  10
11   5.945  11
12   7.778  12
13  10.077  13
14  15.311  14
15   8.591  15
16   6.048  16
17   7.568  17
18   6.140  18
19   6.591  19
20   5.376  20
21   8.038  21
22   7.496  22
23   7.983  23
24   6.591  24
25   6.591  25
26   7.440  26
27   6.453  27
28  11.589  28
29   5.751  29
30   8.464  30
31   7.577  31
32   6.014  32
33  12.733  33
34   7.108  34
35  14.857  35
...
...

One of the solutions is to split the data.table into two half's and then manually check the split point

split(dt, cumsum(dt$value) <= sum(dt$value)/2)

However, I just need the ID where the sum of upper and lower values is almost equal. Is there a solution without using the split as using the split function is very inefficient on a large dataset?

As given in the example dt, the number of rows is even, so the expected result in this particular case should be 43.5.

CodePudding user response:

With which.min:

which.min(abs(cumsum(dt$value) - (sum(dt$value) / 2)))
#[1] 43

In the case where IDs are different from row numbers, you can index:

dt$ID[which.min(abs(cumsum(dt$value) - (sum(dt$value) / 2)))]
#[1] 43
  • Related