Home > front end >  Divide Dataframe Multiindex level 1 by the sum of level 0
Divide Dataframe Multiindex level 1 by the sum of level 0

Time:05-25

I have created a DataFrame like this:

df = pd.DataFrame(
    {
        'env': ['us', 'us', 'us', 'eu'],
        'name': ['first', 'first', 'first', 'second'],
        'default_version': ['2.0.1','2.0.1','2.0.1', '2.1.1'],
        'version': ['2.2.1', '2.2.2.4', '2.3', '2.2.24'],
        'count_events': [1, 8, 102, 244],
        'unique_users': [1, 3, 72, 111]
    }
    )
df = df.pivot_table(index=['env', 'name', 'default_version'], \
    columns='version', values=['count_events', 'unique_users'], aggfunc=np.sum)

Next I'm looking for is to find sum of all count_events at level=1 and sum of all unique_users at level=1, so I can find percentage of count_events and unique_users in each version.

I have generated the sum with the following code, but I don't know how to generate the %.

sums = df.sum(level=0, axis=1)
sums.columns = pd.MultiIndex.from_product([sums.columns, ['SUM']])
final_result = pd.concat([df, sums], axis=1)  

It would not be a problem to change the sum code if necessary.

CodePudding user response:

You can reindex your sums to match the shape of the original data using a combination of reindex and set_axis:

In [14]: fraction = (
    ...:     df / (
    ...:         sums
    ...:         .reindex(df.columns.get_level_values(0), axis=1)
    ...:         .set_axis(df.columns, axis=1)
    ...:     )
    ...: ).fillna(0)

In [15]: fraction
Out[15]:
                           count_events                            unique_users
version                           2.2.1   2.2.2.4 2.2.24       2.3        2.2.1   2.2.2.4 2.2.24       2.3
env name   default_version
eu  second 2.1.1               0.000000  0.000000    1.0  0.000000     0.000000  0.000000    1.0  0.000000
us  first  2.0.1               0.009009  0.072072    0.0  0.918919     0.013158  0.039474    0.0  0.947368
  • Related