I have a data.table
in which I want to find the column ID at which the sum of all rows above and below that ID is equal or almost equal.
dt = x <- structure(list(value = c(7.496, 11.073, 11.329, 9.282, 8.748, 12.515, 7.46, 9.189, 9.62, 5.815, 5.945,
7.778, 10.077, 15.311, 8.591, 6.048, 7.568, 6.14, 6.591, 5.376,
8.038, 7.496, 7.983, 6.591, 6.591, 7.44, 6.453, 11.589, 5.751,
8.464, 7.577, 6.014, 12.733, 7.108, 14.857, 15.503, 12.468, 13.39,
10.796, 10.923, 7.215, 13.72, 7.574, 11.77, 10.409, 7.591, 6.174,
6.748, 10.091, 9.8, 6.527, 9.251, 6.622, 13.742, 4.454, 8.331,
7.702, 7.197, 9.629, 9.76, 3.663, 19.55, 8.107, 9.637, 10.146,
9.564, 6.947, 14.45, 10.266, 5.457, 10.629, 6.275, 2.48, 4.513,
6.755, 2.885, 5.773, 2.855, 2.429, 2.955, 2.486, 3.239, 4.29,
3.043, 3.501, 3.276, 4.018, 2.727, 5.199, 2.371, 3.732, 2.533,
4.482, 3.215, 7.782, 3.435, 4.201, 3.074, 3.475, 2.923, 3.025,
4.308, 3.932, 2.923, 3.491, 2.852, 3.916), ID = 1:107), row.names = c(NA,
-107L), class = "data.frame")
> dt
value ID
1 7.496 1
2 11.073 2
3 11.329 3
4 9.282 4
5 8.748 5
6 12.515 6
7 7.460 7
8 9.189 8
9 9.620 9
10 5.815 10
11 5.945 11
12 7.778 12
13 10.077 13
14 15.311 14
15 8.591 15
16 6.048 16
17 7.568 17
18 6.140 18
19 6.591 19
20 5.376 20
21 8.038 21
22 7.496 22
23 7.983 23
24 6.591 24
25 6.591 25
26 7.440 26
27 6.453 27
28 11.589 28
29 5.751 29
30 8.464 30
31 7.577 31
32 6.014 32
33 12.733 33
34 7.108 34
35 14.857 35
...
...
One of the solutions is to split the data.table into two half's and then manually check the split point
split(dt, cumsum(dt$value) <= sum(dt$value)/2)
However, I just need the ID where the sum of upper and lower values is almost equal. Is there a solution without using the split as using the split function is very inefficient on a large dataset?
As given in the example dt
, the number of rows is even, so the expected result in this particular case should be 43.5
.
CodePudding user response:
With which.min
:
which.min(abs(cumsum(dt$value) - (sum(dt$value) / 2)))
#[1] 43
In the case where IDs are different from row numbers, you can index:
dt$ID[which.min(abs(cumsum(dt$value) - (sum(dt$value) / 2)))]
#[1] 43