column multiplication based on a mapping-CodePudding

I have the following two dataframes. The first one, maps some nodes to area number and the maximum electric load of that node.

bus = pd.DataFrame(data={'Node':[101, 102, 103, 104, 105], 'Area':[1, 1, 2, 2, 3], 'Load':[10, 15, 12, 20, 25]})

which gives us:

   Bus  Area  Load
0  101     1    10
1  102     1    15
2  103     2    12
3  104     2    20
4  105     3    25

The second dataframe, shows the total electric load of each area over a time period (from hour 0 to 5). The column names are the areas (matching the column Area in dataframe bus.

load = pd.DataFrame(data={1:[20, 18, 17, 19, 22, 25], 2:[23, 25,24, 27, 30, 32], 3:[10, 14, 19, 25, 22, 20]})

which gives us:

    1   2   3
0  20  23  10
1  18  25  14
2  17  24  19
3  19  27  25
4  22  30  22
5  25  32  20

I would like to have a dataframe that shows the electric load of each bus over the 6 hours. Assumption: The percentage of the load over time is the same as the percentage of the maximum load shown in bus; e.g., bus 101 has 10/(10 15)=0.4 percent of the electric load of area 1, therefore, to calculate its hourly load, 10/(10 15) should be multiplied by the column corresponding to area 1 in load.

The desired output should be of the following format:

    101     102     103     104     105
0   8       12      8.625   14.375  10
1   7.2     10.8    9.375   15.625  14
2   6.8     10.2    9       15      19
3   7.6     11.4    10.125  16.875  25
4   8.8     13.2    11.25   18.75   22
5   10      15      12      20      20

For column 101, we have 0.4 multiplied by column 1 of load.

Any help is greatly appreaciated.

CodePudding user response：

You can calculate the ratio in bus, transpose load, merge the two and multiply the ratio by the load, here goes:

bus['area_sum'] = bus.groupby('Area')['Load'].transform('sum')
bus['node_ratio'] = bus['Load'] / bus['area_sum']

full_data = bus.merge(load.T.reset_index(), left_on='Area', right_on='index')

result = pd.DataFrame([full_data['node_ratio'] * full_data[x] for x in range(6)])
result.columns = full_data['Node'].values

result:

	101	102	103	104	105
0	8	12	8.625	14.375	10
1	7.2	10.8	9.375	15.625	14
2	6.8	10.2	9	15	19
3	7.6	11.4	10.125	16.875	25
4	8.8	13.2	11.25	18.75	22
5	10	15	12	20	20

CodePudding user response：

One option is to get the Load divided by the sum, then pivot, get the index matching for both load and bus, before multiplying on the matching levels:

(bus.assign(Load = bus.Load.div(bus.groupby('Area').Load.transform('sum')))
    .pivot(None, ['Area', 'Node'], 'Load')
    .reindex(load.index)
    .ffill() # get the data spread into all rows
    .bfill()
    .mul(load, level=0)
    .droplevel(0,1)
    .rename_axis(columns=None)
) 
    101   102     103     104   105
0   8.0  12.0   8.625  14.375  10.0
1   7.2  10.8   9.375  15.625  14.0
2   6.8  10.2   9.000  15.000  19.0
3   7.6  11.4  10.125  16.875  25.0
4   8.8  13.2  11.250  18.750  22.0
5  10.0  15.0  12.000  20.000  20.0