Calculation of relative size within a group in pandas DataFrame-CodePudding

I have a DataFrame size with daily sizes and based on the second DataFrame group, I try to calculate the relative size within the groups on each day.

size = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'],
    '01K W':[10, 15, 20, 18, 13], 
    '02K W':[14, 24, 'nan', 'nan', 18], 
    '03K W':[13, 12, 16, 21, 19], 
    '04K W':[16, 25, 15, 31, 20]}) 
size = size.replace('nan', np.nan)

group = pd.DataFrame({
    'ID':['01K W', '02K W', '03K W', '04K W'],
    'Group':['Group1', 'Group1', 'Group2', 'Group1']})

What I attempt is a DataFrame similar to result with the relative weights assigned.

result = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'],
    '01K W':[0.25, 0.23, 0.57, 0.37, 0.25], 
    '02K W':[0.35, 0.38, 0, 0, 0.35], 
    '03K W':[1, 1, 1, 1, 1], 
    '04K W':[0.4, 0.39, 0.43, 0.63, 0.39]})

Is there an efficient way for a big dataset? Thanks!

CodePudding user response：

You should replace "nan" with 0 if you want your sum to be correctly calculated. Then group by group and transform:

size = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'],
    '01K W':[10, 15, 20, 18, 13], 
    '02K W':[14, 24, 'nan', 'nan', 18], 
    '03K W':[13, 12, 16, 21, 19], 
    '04K W':[16, 25, 15, 31, 20]}) 

group = pd.DataFrame({
    'ID':['01K W', '02K W', '03K W', '04K W'],
    'Group':['Group1', 'Group1', 'Group2', 'Group1']}) 

size = size.replace('nan', 0).set_index("Date").T
size["Groups"] = list(group['Group'])

print(size.groupby("Groups").transform(lambda x: x/x.sum()).T)

Output:

               01K W     02K W  03K W     04K W
Date                                           
2021-01-01  0.250000  0.350000    1.0  0.400000
2021-01-02  0.234375  0.375000    1.0  0.390625
2021-01-03  0.571429  0.000000    1.0  0.428571
2021-01-04  0.367347  0.000000    1.0  0.632653
2021-01-05  0.254902  0.352941    1.0  0.392157