Home > Net >  How to perform operations with columns from different datasets with different indexation?
How to perform operations with columns from different datasets with different indexation?

Time:07-23

The goal

A bit of background, to get familiar with variables and understand what the problem is:

  1. floor, square, matc and volume are tables or dataframes, all share same column "id" (which simply goes from 1 to 100), so every row is unique;
  2. floor and square also share column "room_name";
  3. volume is generally equivalent to floor, except all rows with rooms ("room_name") that have no values in "square" column of square dataframe were dropped; This implies that some values of "id" are missing

That done, I needed to create a new column in volume dataframe, which would consist of multiplication of one of its own columns with two other columns from matc and square dataframes.

The problem

This seemingly simple interaction turned out to be quite difficult, because, well, the columns I am working with are of different length (except for square and matc, they are the same) and I need to allign them by "id". To make matters worse, when called directly as volume['coefLoosening'] (please note that coefLoosening does not originate from floor and is added after the table is created), it returns a series with its own index and no way to relate it to "id".

What I attempted

Whilst trying to solve the issue, I came up with this abomination:

volume = volume.merge(pd.DataFrame({"id": matc.loc[matc["id"].isin(volume["id"])]["id"], "tempCoef": volume['coefLoosening'] * matc.loc[matc["id"].isin(volume["id"])]['width'] * square.loc[square["id"].isin(volume["id"])]['square']}), how = "left", on = ["id"])

This, however, misaligns "id" column completely, somehow creating more rows. For instance, this what `` returns:

index   id      tempCoef
0       1.0     960.430612244898
1       2.0     4665.499999999999
2       NaN     NaN
3       4.0     2425.44652173913
4       5.0     5764.964210526316
5       6.0     55201.68727272727
6       NaN     NaN
7       NaN     NaN
8       NaN     NaN
9       10.0    1780.7208791208789
10    11.0      6075.385074626865
11    12.0      10400.94
12    13.0      31.378285714285713
13    NaN       NaN
14    NaN       NaN
15    NaN       NaN
16    17.0      10505.431451612903
17    18.0      1208.994845360825
18    NaN       NaN
19    NaN       NaN
20    21.0      568.8900000000001
21    22.0      4275.416470588235
22    NaN       NaN
23    NaN       NaN
24    25.0      547.04
25    26.0      2090.666111111111
26    27.0      2096.88406779661
27    NaN       NaN
28    29.0      8324.566547619048
29    NaN       NaN
30    NaN       NaN
31    NaN       NaN
32    33.0      2459.8314736842103
33    34.0      2177.778461538461
34    35.0      166.1257142857143
35    36.0      1866.8492307692304
36    37.0      3598.1470588235293
37    38.0      21821.709411764703
38    NaN       NaN
39    40.0      2999.248
40    41.0      980.3136
41    42.0      2641.3503947368426
42    NaN       NaN
43    44.0      25829.878148148146
44    45.0      649.3632
45    46.0      10895.386666666667
46    NaN       NaN
47    NaN       NaN
48    49.0      825.9879310344828
49    50.0      15951.941666666671
50    51.0      2614.9343434343436
51    52.0      2462.30625
52    NaN       NaN
53    NaN       NaN
54    55.0      1366.8287671232877
55    56.0      307.38
56    57.0      11601.975
57    58.0      1002.5415730337081
58    59.0      2493.4532432432434
59    60.0      981.7482608695652
61    62.0      NaN
63    64.0      NaN
65    66.0      NaN
67    68.0      NaN
73    74.0      NaN
75    76.0      NaN
76    77.0      NaN
77    78.0      NaN
78    79.0      NaN
80    81.0      NaN
82    83.0      NaN  
84    85.0      NaN  
88    89.0      NaN  
89    90.0      NaN  
90    91.0      NaN  
92    93.0      NaN  
94    95.0      NaN  
95    96.0      NaN  
97    98.0      NaN  
98    99.0      NaN  
99    100.0     NaN      

For clarity, no values in any of columns in the operation have NaNs in them.

This is what 'volume["coefLoosening"]` returns:

0      1.020408
1      1.515152
2      2.000000
3      4.347826
4      5.263158
5      9.090909
6      1.162791
7      1.149425
8      1.851852
9      1.098901
10     1.492537
11     2.083333
12     1.428571
13     1.010101
14     1.562500
15     3.448276
16     1.612903
17     1.030928
18    33.333333
19     1.000000
20     1.123596
21     1.960784
22     2.127660
23     2.857143
24     1.369863
25     1.111111
26     1.694915
27     1.492537
28     1.190476
29     1.818182
30     1.612903
31    12.500000
32     1.052632
33     3.846154
34     2.040816
35     1.098901
36     2.941176
37     2.941176
38     2.857143
39     1.111111
40     1.333333
41     1.315789
42     3.703704
43     3.703704
44     2.000000
45    33.333333
46    12.500000
47     1.149425
48     1.724138
49     4.166667
50     1.010101
51     1.041667
52     1.162791
53     3.225806
54     1.369863
55     1.666667
56     4.545455
57     1.123596
58     1.351351
59     2.173913

and finally, this is what volume["id"] returns (to compare to the result of «abomination»):

0       1
1       2
2       4
3       5
4       6
5      10
6      11
7      12
8      13
9      17
10     18
11     21
12     22
13     25
14     26
15     27
16     29
17     33
18     34
19     35
20     36
21     37
22     38
23     40
24     41
25     42
26     44
27     45
28     46
29     49
30     50
31     51
32     52
33     55
34     56
35     57
36     58
37     59
38     60
39     62
40     64
41     66
42     68
43     74
44     76
45     77
46     78
47     79
48     81
49     83
50     85
51     89
52     90
53     91
54     93
55     95
56     96
57     98
58     99
59    100

Some thoughts

I believe, part of the problem is how pandas returns columns (as series with default indexation) and I don't know how to work around that.

Another source of the problem might be the way how .loc() method returns its result. In the case of matc.loc[matc["id"].isin(volume["id"])]['width'] it is:

0     15.98
1     36.12
3     32.19
4     18.54
5     98.96
9     64.56
10    58.20
11    55.08
12     3.84
16    77.31
17    15.25
20    63.21
21    76.32
24    10.52
25    54.65
26    95.46
28    79.67
32    57.01
33    27.54
34     7.36
35    36.44
36    23.64
37    78.98
39    92.19
40    31.26
41    61.71
43    70.07
44    10.91
45     4.24
48     7.35
49    46.70
50    97.69
51    32.03
54    13.50
55    42.30
56    94.71
57    37.49
58    57.86
59    50.29
61    18.18
63    88.26
65     4.28
67    28.89
73     4.05
75    22.37
76    52.20
77    98.29
78    72.98
80     6.07
82    35.80
84    64.16
88    23.60
89    45.05
90    21.14
92    31.21
94    46.04
95     7.15
97    27.70
98    31.93
99    79.62

which is shifted by -1 and I don't see a way to change this manually.

So, any ideas? Maybe there is answered analogue of this question (because I tried to search it before asking, but found nothing)?

Data

Minimal columns of tables required to replicate this (because stack overflow does not allow files to be uploaded)

volume:


index,id,room_name,coefLoosening
0,1,6,1.0204081632653061
1,2,7,1.5151515151515151
2,4,3,2.0
3,5,7,4.3478260869565215
4,6,4,5.2631578947368425
5,10,7,9.090909090909092
6,11,5,1.1627906976744187
7,12,4,1.1494252873563218
8,13,1,1.8518518518518516
9,17,3,1.0989010989010988
10,18,3,1.4925373134328357
11,21,3,2.0833333333333335
12,22,7,1.4285714285714286
13,25,3,1.0101010101010102
14,26,6,1.5625
15,27,6,3.4482758620689657
16,29,4,1.6129032258064517
17,33,2,1.0309278350515465
18,34,2,33.333333333333336
19,35,5,1.0
20,36,4,1.1235955056179776
21,37,2,1.9607843137254901
22,38,6,2.127659574468085
23,40,5,2.857142857142857
24,41,6,1.36986301369863
25,42,3,1.1111111111111112
26,44,2,1.6949152542372883
27,45,4,1.4925373134328357
28,46,2,1.1904761904761905
29,49,5,1.8181818181818181
30,50,4,1.6129032258064517
31,51,2,12.5
32,52,3,1.0526315789473684
33,55,6,3.846153846153846
34,56,5,2.0408163265306123
35,57,5,1.0989010989010988
36,58,4,2.941176470588235
37,59,5,2.941176470588235
38,60,5,2.857142857142857
39,62,7,1.1111111111111112
40,64,7,1.3333333333333333
41,66,7,1.3157894736842106
42,68,3,3.7037037037037033
43,74,5,3.7037037037037033
44,76,4,2.0
45,77,3,33.333333333333336
46,78,4,12.5
47,79,5,1.1494252873563218
48,81,5,1.7241379310344829
49,83,4,4.166666666666667
50,85,2,1.0101010101010102
51,89,4,1.0416666666666667
52,90,1,1.1627906976744187
53,91,2,3.2258064516129035
54,93,2,1.36986301369863
55,95,1,1.6666666666666667
56,96,4,4.545454545454546
57,98,7,1.1235955056179776
58,99,7,1.3513513513513513
59,100,5,2.1739130434782608

matc:

index,id,width
0,1,15.98
1,2,36.12
2,3,63.41
3,4,32.19
4,5,18.54
5,6,98.96
6,7,5.65
7,8,97.42
8,9,50.88
9,10,64.56
10,11,58.2
11,12,55.08
12,13,3.84
13,14,75.87
14,15,96.51
15,16,42.08
16,17,77.31
17,18,15.25
18,19,81.43
19,20,98.71
20,21,63.21
21,22,76.32
22,23,22.59
23,24,30.79
24,25,10.52
25,26,54.65
26,27,95.46
27,28,49.93
28,29,79.67
29,30,45.0
30,31,59.14
31,32,62.25
32,33,57.01
33,34,27.54
34,35,7.36
35,36,36.44
36,37,23.64
37,38,78.98
38,39,47.8
39,40,92.19
40,41,31.26
41,42,61.71
42,43,93.11
43,44,70.07
44,45,10.91
45,46,4.24
46,47,35.39
47,48,99.1
48,49,7.35
49,50,46.7
50,51,97.69
51,52,32.03
52,53,48.61
53,54,33.44
54,55,13.5
55,56,42.3
56,57,94.71
57,58,37.49
58,59,57.86
59,60,50.29
60,61,77.98
61,62,18.18
62,63,3.42
63,64,88.26
64,65,48.66
65,66,4.28
66,67,20.78
67,68,28.89
68,69,27.17
69,70,57.48
70,71,59.07
71,72,12.63
72,73,22.06
73,74,4.05
74,75,22.3
75,76,22.37
76,77,52.2
77,78,98.29
78,79,72.98
79,80,49.37
80,81,6.07
81,82,28.85
82,83,35.8
83,84,66.74
84,85,64.16
85,86,33.64
86,87,66.36
87,88,34.51
88,89,23.6
89,90,45.05
90,91,21.14
91,92,97.27
92,93,31.21
93,94,13.04
94,95,46.04
95,96,7.15
96,97,47.87
97,98,27.7
98,99,31.93
99,100,79.62

square:

index,id,room_name,square
0,1,5,58.9
1,2,3,85.25
2,3,5,90.39
3,4,3,17.33
4,5,2,59.08
5,6,4,61.36
6,7,2,29.02
7,8,2,59.63
8,9,6,98.31
9,10,4,25.1
10,11,3,69.94
11,12,7,90.64
12,13,4,5.72
13,14,6,29.96
14,15,4,59.06
15,16,1,41.85
16,17,7,84.25
17,18,4,76.9
18,19,1,17.2
19,20,4,60.9
20,21,1,8.01
21,22,2,28.57
22,23,1,65.07
23,24,1,20.24
24,25,7,37.96
25,26,7,34.43
26,27,3,12.96
27,28,6,80.96
28,29,5,87.77
29,30,2,95.67
30,31,1,10.4
31,32,1,30.96
32,33,6,40.99
33,34,7,20.56
34,35,5,11.06
35,36,4,46.62
36,37,3,51.75
37,38,4,93.94
38,39,5,62.64
39,40,6,29.28
40,41,3,23.52
41,42,6,32.53
42,43,1,33.3
43,44,3,99.53
44,45,5,29.76
45,46,7,77.09
46,47,1,71.31
47,48,2,59.22
48,49,1,65.18
49,50,7,81.98
50,51,7,26.5
51,52,3,73.8
52,53,6,78.52
53,54,6,69.67
54,55,6,73.91
55,56,6,4.36
56,57,5,26.95
57,58,2,23.8
58,59,2,31.89
59,60,1,8.98
60,61,1,88.76
61,62,5,88.75
62,63,4,44.94
63,64,4,81.13
64,65,5,48.39
65,66,3,55.63
66,67,7,46.28
67,68,3,40.85
68,69,7,54.37
69,70,3,14.01
70,71,6,20.13
71,72,2,90.67
72,73,3,4.28
73,74,4,56.18
74,75,3,74.8
75,76,5,10.34
76,77,6,15.94
77,78,2,29.4
78,79,6,60.8
79,80,3,13.05
80,81,3,49.46
81,82,1,75.76
82,83,1,84.27
83,84,5,76.36
84,85,3,75.98
85,86,7,77.81
86,87,2,56.34
87,88,1,43.93
88,89,5,30.64
89,90,5,55.78
90,91,5,88.26
91,92,6,15.11
92,93,1,20.64
93,94,2,5.08
94,95,1,82.31
95,96,4,76.92
96,97,1,53.47
97,98,2,2.7
98,99,7,77.12
99,100,4,29.43

floor:

index,id,room_name
0,1,6
1,2,7
2,3,12
3,4,3
4,5,7
5,6,4
6,7,8
7,8,11
8,9,10
9,10,7
10,11,5
11,12,4
12,13,1
13,14,11
14,15,12
15,16,9
16,17,3
17,18,3
18,19,9
19,20,12
20,21,3
21,22,7
22,23,8
23,24,12
24,25,3
25,26,6
26,27,6
27,28,10
28,29,4
29,30,10
30,31,9
31,32,11
32,33,2
33,34,2
34,35,5
35,36,4
36,37,2
37,38,6
38,39,11
39,40,5
40,41,6
41,42,3
42,43,11
43,44,2
44,45,4
45,46,2
46,47,9
47,48,12
48,49,5
49,50,4
50,51,2
51,52,3
52,53,9
53,54,10
54,55,6
55,56,5
56,57,5
57,58,4
58,59,5
59,60,5
60,61,12
61,62,7
62,63,12
63,64,7
64,65,11
65,66,7
66,67,12
67,68,3
68,69,8
69,70,11
70,71,12
71,72,8
72,73,12
73,74,5
74,75,11
75,76,4
76,77,3
77,78,4
78,79,5
79,80,12
80,81,5
81,82,12
82,83,4
83,84,8
84,85,2
85,86,8
86,87,8
87,88,9
88,89,4
89,90,1
90,91,2
91,92,9
92,93,2
93,94,12
94,95,1
95,96,4
96,97,8
97,98,7
98,99,7
99,100,5

CodePudding user response:

IIUC you overcomplicated things. The whole thing about merging on id is that you don't need to filter the other df's beforehand on id with loc and isin like you tried to do, merge will do that for you.
You could multiply square and width at the square_df (matc_df would also work since they have same length and id). Then merge this new column to the volume_df (which filters the multiplied result only to the id's which are found in the volume_df) and multiply it again.

square_df['square*width'] = square_df['square'] * matc_df['width']
df = volume_df.merge(square_df[['id', 'square*width']], on='id', how='left')
df['result'] = df['coefLoosening'] * df['square*width']

Output df:

     id  room_name  coefLoosening  square*width        result
0     1          6       1.020408      941.2220    960.430612
1     2          7       1.515152     3079.2300   4665.500000
2     4          3       2.000000      557.8527   1115.705400
3     5          7       4.347826     1095.3432   4762.361739
4     6          4       5.263158     6072.1856  31958.871579
5    10          7       9.090909     1620.4560  14731.418182
6    11          5       1.162791     4070.5080   4733.148837
7    12          4       1.149425     4992.4512   5738.449655
8    13          1       1.851852       21.9648     40.675556
9    17          3       1.098901     6513.3675   7157.546703
10   18          3       1.492537     1172.7250   1750.335821
11   21          3       2.083333      506.3121   1054.816875
12   22          7       1.428571     2180.4624   3114.946286
13   25          3       1.010101      399.3392    403.372929
14   26          6       1.562500     1881.5995   2939.999219
15   27          6       3.448276     1237.1616   4266.074483
16   29          4       1.612903     6992.6359  11278.445000
17   33          2       1.030928     2336.8399   2409.113299
18   34          2      33.333333      566.2224  18874.080000
19   35          5       1.000000       81.4016     81.401600
20   36          4       1.123596     1698.8328   1908.800899
21   37          2       1.960784     1223.3700   2398.764706
22   38          6       2.127660     7419.3812  15785.917447
23   40          5       2.857143     2699.3232   7712.352000
24   41          6       1.369863      735.2352   1007.171507
25   42          3       1.111111     2007.4263   2230.473667
26   44          2       1.694915     6974.0671  11820.452712
27   45          4       1.492537      324.6816    484.599403
28   46          2       1.190476      326.8616    389.120952
29   49          5       1.818182      479.0730    871.041818
30   50          4       1.612903     3828.4660   6174.945161
31   51          2      12.500000     2588.7850  32359.812500
32   52          3       1.052632     2363.8140   2488.225263
33   55          6       3.846154      997.7850   3837.634615
34   56          5       2.040816      184.4280    376.383673
35   57          5       1.098901     2552.4345   2804.873077
36   58          4       2.941176      892.2620   2624.300000
37   59          5       2.941176     1845.1554   5426.927647
38   60          5       2.857143      451.6042   1290.297714
39   62          7       1.111111     1613.4750   1792.750000
40   64          7       1.333333     7160.5338   9547.378400
41   66          7       1.315789      238.0964    313.284737
42   68          3       3.703704     1180.1565   4370.950000
43   74          5       3.703704      227.5290    842.700000
44   76          4       2.000000      231.3058    462.611600
45   77          3      33.333333      832.0680  27735.600000
46   78          4      12.500000     2889.7260  36121.575000
47   79          5       1.149425     4437.1840   5100.211494
48   81          5       1.724138      300.2222    517.624483
49   83          4       4.166667     3016.8660  12570.275000
50   85          2       1.010101     4874.8768   4924.117980
51   89          4       1.041667      723.1040    753.233333
52   90          1       1.162791     2512.8890   2921.963953
53   91          2       3.225806     1865.8164   6018.762581
54   93          2       1.369863      644.1744    882.430685
55   95          1       1.666667     3789.5524   6315.920667
56   96          4       4.545455      549.9780   2499.900000
57   98          7       1.123596       74.7900     84.033708
58   99          7       1.351351     2462.4416   3327.623784
59  100          5       2.173913     2343.2166   5093.949130


  • Related