Home > Net >  How to combine two dataframe based on column route?
How to combine two dataframe based on column route?

Time:05-10

Column route represents the point for each location

So I have two tables like this, my question is how to calculate the sum of the load difference between the two tables if the route has to match with every location [1,2,3,4,5]

for example:

df1['Route'] = [1,5] and df2['Route'] = [2,3,4] then the result is [[1,5],[2,3,4]] with sum of load difference = 368.93 (-10325.19) = -9956,26

df1

index Route Total Distance Total Demand Total Time Total Volume Load Difference
8 1,5 2257 642.46 173 4631.07 368.93
18 1,3,4 2179 633.65 167 4409.15 590.85
7 1,4 2152 538.2 165 3700.13 1299.87
24 3,4,5 2086 508.09 160 3387.27 1612.73
6 1,3 2146 479.46 165 3296.29 1703.71
0 1 2104 384.01 161 2576.07 2423.93
14 4,5 1775 412.64 136 2338.29 2661.71
13 3,5 2029 353.9 156 2300.35 2699.65
12 3,4 1881 249.64 144 1497.84 3502.16
4 5 1570 258.45 120 1292.25 3707.75
3 4 1570 154.19 120 770.95 4229.05
2 3 1848 95.45 142 564.75 4435.25
19 1,3,5 2327 737.91 179 5503.58 -503.58
20 1,4,5 2357 796.65 181 6008.07 -1008.07
28 1,3,4,5 2384 892.1 183 6802.26 -1802.26
1 2 1960 2510.39 150 15689.94 -10689.94
9 2,3 1986 2605.84 152 16503.65 -11503.65
10 2,4 1994 2664.58 153 16986.7 -11986.7
21 2,3,4 2019 2760.03 155 17825.19 -12825.19
11 2,5 2097 2768.84 161 18574.3 -13574.3
5 1,2 2140 2894.4 164 19778.4 -14778.4
22 2,3,5 2167 2864.29 166 19811.34 -14811.34
23 2,4,5 2199 2923.03 169 20583.0 -15583.0
15 1,2,3 2166 2989.85 166 20679.8 -15679.8
16 1,2,4 2174 3048.59 167 21213.11 -16213.11
29 2,3,4,5 2224 3018.48 171 21506.67 -16506.67
25 1,2,3,4 2199 3144.04 169 22139.28 -17139.28
17 1,2,5 2277 3152.85 175 22989.53 -17989.53
26 1,2,3,5 2347 3248.3 180 24362.25 -19362.25
27 1,2,4,5 2379 3307.04 183 25216.18 -20216.18
30 1,2,3,4,5 2404 3402.49 184 26085.76 -21085.76

df2

index Route Total Distance Total Demand Total Time Total Volume Load Difference
28 1,3,4,5 2384 892.1 183 6802.26 697.74
20 1,4,5 2357 796.65 181 6008.07 1491.93
19 1,3,5 2327 737.91 179 5503.58 1996.42
8 1,5 2257 642.46 173 4631.07 2868.93
18 1,3,4 2179 633.65 167 4409.15 3090.85
7 1,4 2152 538.2 165 3700.13 3799.87
24 3,4,5 2086 508.09 160 3387.27 4112.73
6 1,3 2146 479.46 165 3296.29 4203.71
0 1 2104 384.01 161 2576.07 4923.93
14 4,5 1775 412.64 136 2338.29 5161.71
13 3,5 2029 353.9 156 2300.35 5199.65
12 3,4 1881 249.64 144 1497.84 6002.16
4 5 1570 258.45 120 1292.25 6207.75
3 4 1570 154.19 120 770.95 6729.05
2 3 1848 95.45 142 564.75 6935.25
1 2 1960 2510.39 150 15689.94 -8189.94
9 2,3 1986 2605.84 152 16503.65 -9003.65
10 2,4 1994 2664.58 153 16986.7 -9486.7
21 2,3,4 2019 2760.03 155 17825.19 -10325.19
11 2,5 2097 2768.84 161 18574.3 -11074.3
5 1,2 2140 2894.4 164 19778.4 -12278.4
22 2,3,5 2167 2864.29 166 19811.34 -12311.34
23 2,4,5 2199 2923.03 169 20583.0 -13083.0
15 1,2,3 2166 2989.85 166 20679.8 -13179.8
16 1,2,4 2174 3048.59 167 21213.11 -13713.11
29 2,3,4,5 2224 3018.48 171 21506.67 -14006.67
25 1,2,3,4 2199 3144.04 169 22139.28 -14639.28
17 1,2,5 2277 3152.85 175 22989.53 -15489.53
26 1,2,3,5 2347 3248.3 180 24362.25 -16862.25
27 1,2,4,5 2379 3307.04 183 25216.18 -17716.18
30 1,2,3,4,5 2404 3402.49 184 26085.76 -18585.76

CodePudding user response:

Answer

import numpy as np

prime_map = {k:v for k,v in zip("12345", [2,3,5,7,11])}

def road_key_to_number(x):
    return np.prod(list(prime_map[k] for k in x.split(",")))


df1["key"] = df1["Route"].apply(road_key_to_number)
df2["key"] = df2["Route"].apply(road_key_to_number)
df2["pair_key"] = np.prod(list(prime_map.values())) / df2["key"]
data = df1.merge(df2, left_on = "key", right_on="pair_key")
data["diff"] = (data["Load Difference_x"]   data["Load Difference_y"])

Output

    Route_x  Route_y      diff
0       1,5    2,3,4  -9956.26
1     1,3,4      2,5 -10483.45
2       1,4    2,3,5 -11011.47
3     3,4,5      1,2 -10665.67
4       1,3    2,4,5 -11379.29
5         1  2,3,4,5 -11582.74
6       4,5    1,2,3 -10518.09
7       3,5    1,2,4 -11013.46
8       3,4    1,2,5 -11987.37
9         5  1,2,3,4 -10931.53
10        4  1,2,3,5 -12633.20
11        3  1,2,4,5 -13280.93
12    1,3,5      2,4  -9990.28
13    1,4,5      2,3 -10011.72
14  1,3,4,5        2  -9992.20
15        2  1,3,4,5  -9992.20
16      2,3    1,4,5 -10011.72
17      2,4    1,3,5  -9990.28
18    2,3,4      1,5  -9956.26
19      2,5    1,3,4 -10483.45
20      1,2    3,4,5 -10665.67
21    2,3,5      1,4 -11011.47
22    2,4,5      1,3 -11379.29
23    1,2,3      4,5 -10518.09
24    1,2,4      3,5 -11013.46
25  2,3,4,5        1 -11582.74
26  1,2,3,4        5 -10931.53
27    1,2,5      3,4 -11987.37
28  1,2,3,5        4 -12633.20
29  1,2,4,5        3 -13280.93

Explanation

  • The key idea is to match two df.Route to calculate the diff between them.
  • Here prime_map is using a method called Arithmeticization, it has been used in Metamathematics to prove Gödel's incompleteness theorems
    • Each node in the route has a unique prime number, so, we can use a unique product from them to represent a road, and it is order-independent, which means 1,2,4 equal 4,2,1
    • You also could use another method like intersection or union to represent it.

Reference

  • Related