I have created a DataFrame like this:
df = pd.DataFrame(
{
'env': ['us', 'us', 'us', 'eu'],
'name': ['first', 'first', 'first', 'second'],
'default_version': ['2.0.1','2.0.1','2.0.1', '2.1.1'],
'version': ['2.2.1', '2.2.2.4', '2.3', '2.2.24'],
'count_events': [1, 8, 102, 244],
'unique_users': [1, 3, 72, 111]
}
)
df = df.pivot_table(index=['env', 'name', 'default_version'], \
columns='version', values=['count_events', 'unique_users'], aggfunc=np.sum)
Next I'm looking for is to find sum of all count_events at level=1 and sum of all unique_users at level=1, so I can find percentage of count_events and unique_users in each version.
I have generated the sum with the following code, but I don't know how to generate the %.
sums = df.sum(level=0, axis=1)
sums.columns = pd.MultiIndex.from_product([sums.columns, ['SUM']])
final_result = pd.concat([df, sums], axis=1)
It would not be a problem to change the sum code if necessary.
CodePudding user response:
You can reindex your sums to match the shape of the original data using a combination of reindex
and set_axis
:
In [14]: fraction = (
...: df / (
...: sums
...: .reindex(df.columns.get_level_values(0), axis=1)
...: .set_axis(df.columns, axis=1)
...: )
...: ).fillna(0)
In [15]: fraction
Out[15]:
count_events unique_users
version 2.2.1 2.2.2.4 2.2.24 2.3 2.2.1 2.2.2.4 2.2.24 2.3
env name default_version
eu second 2.1.1 0.000000 0.000000 1.0 0.000000 0.000000 0.000000 1.0 0.000000
us first 2.0.1 0.009009 0.072072 0.0 0.918919 0.013158 0.039474 0.0 0.947368