How to combine two dataframe based on column route?-CodePudding

Column route represents the point for each location

So I have two tables like this, my question is how to calculate the sum of the load difference between the two tables if the route has to match with every location [1,2,3,4,5]

for example:

df1['Route'] = [1,5] and df2['Route'] = [2,3,4] then the result is [[1,5],[2,3,4]] with sum of load difference = 368.93 (-10325.19) = -9956,26

df1

index	Route	Total Distance	Total Demand	Total Time	Total Volume	Load Difference
8	1,5	2257	642.46	173	4631.07	368.93
18	1,3,4	2179	633.65	167	4409.15	590.85
7	1,4	2152	538.2	165	3700.13	1299.87
24	3,4,5	2086	508.09	160	3387.27	1612.73
6	1,3	2146	479.46	165	3296.29	1703.71
0	1	2104	384.01	161	2576.07	2423.93
14	4,5	1775	412.64	136	2338.29	2661.71
13	3,5	2029	353.9	156	2300.35	2699.65
12	3,4	1881	249.64	144	1497.84	3502.16
4	5	1570	258.45	120	1292.25	3707.75
3	4	1570	154.19	120	770.95	4229.05
2	3	1848	95.45	142	564.75	4435.25
19	1,3,5	2327	737.91	179	5503.58	-503.58
20	1,4,5	2357	796.65	181	6008.07	-1008.07
28	1,3,4,5	2384	892.1	183	6802.26	-1802.26
1	2	1960	2510.39	150	15689.94	-10689.94
9	2,3	1986	2605.84	152	16503.65	-11503.65
10	2,4	1994	2664.58	153	16986.7	-11986.7
21	2,3,4	2019	2760.03	155	17825.19	-12825.19
11	2,5	2097	2768.84	161	18574.3	-13574.3
5	1,2	2140	2894.4	164	19778.4	-14778.4
22	2,3,5	2167	2864.29	166	19811.34	-14811.34
23	2,4,5	2199	2923.03	169	20583.0	-15583.0
15	1,2,3	2166	2989.85	166	20679.8	-15679.8
16	1,2,4	2174	3048.59	167	21213.11	-16213.11
29	2,3,4,5	2224	3018.48	171	21506.67	-16506.67
25	1,2,3,4	2199	3144.04	169	22139.28	-17139.28
17	1,2,5	2277	3152.85	175	22989.53	-17989.53
26	1,2,3,5	2347	3248.3	180	24362.25	-19362.25
27	1,2,4,5	2379	3307.04	183	25216.18	-20216.18
30	1,2,3,4,5	2404	3402.49	184	26085.76	-21085.76

df2

index	Route	Total Distance	Total Demand	Total Time	Total Volume	Load Difference
28	1,3,4,5	2384	892.1	183	6802.26	697.74
20	1,4,5	2357	796.65	181	6008.07	1491.93
19	1,3,5	2327	737.91	179	5503.58	1996.42
8	1,5	2257	642.46	173	4631.07	2868.93
18	1,3,4	2179	633.65	167	4409.15	3090.85
7	1,4	2152	538.2	165	3700.13	3799.87
24	3,4,5	2086	508.09	160	3387.27	4112.73
6	1,3	2146	479.46	165	3296.29	4203.71
0	1	2104	384.01	161	2576.07	4923.93
14	4,5	1775	412.64	136	2338.29	5161.71
13	3,5	2029	353.9	156	2300.35	5199.65
12	3,4	1881	249.64	144	1497.84	6002.16
4	5	1570	258.45	120	1292.25	6207.75
3	4	1570	154.19	120	770.95	6729.05
2	3	1848	95.45	142	564.75	6935.25
1	2	1960	2510.39	150	15689.94	-8189.94
9	2,3	1986	2605.84	152	16503.65	-9003.65
10	2,4	1994	2664.58	153	16986.7	-9486.7
21	2,3,4	2019	2760.03	155	17825.19	-10325.19
11	2,5	2097	2768.84	161	18574.3	-11074.3
5	1,2	2140	2894.4	164	19778.4	-12278.4
22	2,3,5	2167	2864.29	166	19811.34	-12311.34
23	2,4,5	2199	2923.03	169	20583.0	-13083.0
15	1,2,3	2166	2989.85	166	20679.8	-13179.8
16	1,2,4	2174	3048.59	167	21213.11	-13713.11
29	2,3,4,5	2224	3018.48	171	21506.67	-14006.67
25	1,2,3,4	2199	3144.04	169	22139.28	-14639.28
17	1,2,5	2277	3152.85	175	22989.53	-15489.53
26	1,2,3,5	2347	3248.3	180	24362.25	-16862.25
27	1,2,4,5	2379	3307.04	183	25216.18	-17716.18
30	1,2,3,4,5	2404	3402.49	184	26085.76	-18585.76

CodePudding user response：

Answer

import numpy as np

prime_map = {k:v for k,v in zip("12345", [2,3,5,7,11])}

def road_key_to_number(x):
    return np.prod(list(prime_map[k] for k in x.split(",")))


df1["key"] = df1["Route"].apply(road_key_to_number)
df2["key"] = df2["Route"].apply(road_key_to_number)
df2["pair_key"] = np.prod(list(prime_map.values())) / df2["key"]
data = df1.merge(df2, left_on = "key", right_on="pair_key")
data["diff"] = (data["Load Difference_x"]   data["Load Difference_y"])

Output

    Route_x  Route_y      diff
0       1,5    2,3,4  -9956.26
1     1,3,4      2,5 -10483.45
2       1,4    2,3,5 -11011.47
3     3,4,5      1,2 -10665.67
4       1,3    2,4,5 -11379.29
5         1  2,3,4,5 -11582.74
6       4,5    1,2,3 -10518.09
7       3,5    1,2,4 -11013.46
8       3,4    1,2,5 -11987.37
9         5  1,2,3,4 -10931.53
10        4  1,2,3,5 -12633.20
11        3  1,2,4,5 -13280.93
12    1,3,5      2,4  -9990.28
13    1,4,5      2,3 -10011.72
14  1,3,4,5        2  -9992.20
15        2  1,3,4,5  -9992.20
16      2,3    1,4,5 -10011.72
17      2,4    1,3,5  -9990.28
18    2,3,4      1,5  -9956.26
19      2,5    1,3,4 -10483.45
20      1,2    3,4,5 -10665.67
21    2,3,5      1,4 -11011.47
22    2,4,5      1,3 -11379.29
23    1,2,3      4,5 -10518.09
24    1,2,4      3,5 -11013.46
25  2,3,4,5        1 -11582.74
26  1,2,3,4        5 -10931.53
27    1,2,5      3,4 -11987.37
28  1,2,3,5        4 -12633.20
29  1,2,4,5        3 -13280.93

Explanation

The key idea is to match two df.Route to calculate the diff between them.
Here prime_map is using a method called Arithmeticization, it has been used in Metamathematics to prove Gödel's incompleteness theorems
- Each node in the route has a unique prime number, so, we can use a unique product from them to represent a road, and it is order-independent, which means 1,2,4 equal 4,2,1
- You also could use another method like intersection or union to represent it.

Answer

Output

Explanation

Reference