Given a grouped DataFrame (obtained by df.groupby([col1, col2])
) I would like to obtain the grouping variables (col1
and col2
in this case).
For example, from the GroupBy user guide
import pandas as pd
import numpy as np
df = pd.DataFrame(
[
("bird", "Falconiformes", 389.0),
("bird", "Psittaciformes", 24.0),
("mammal", "Carnivora", 80.2),
("mammal", "Primates", np.nan),
("mammal", "Carnivora", 58),
],
index=["falcon", "parrot", "lion", "monkey", "leopard"],
columns=("class", "order", "max_speed"),
)
grouped = df.groupby(["class", "order"])
Given grouped
I would like to get class
and order
. However, grouped.indices
and grouped.groups
contain only the values of the keys, not the column names.
The column names must be in the object somewhere, because if I run grouped.size()
for example, they are included in the indices:
class order
bird Falconiformes 1
Psittaciformes 1
mammal Carnivora 2
Primates 1
dtype: int64
And therefore I can run grouped.size().index.names
which returns FrozenList(['class', 'order'])
. But this is doing an unnecessary calculation of .size()
. Is there a nicer way of retrieving these from the object?
The ultimate reason I'd like this is so that I can do some processing for a particular group, and associate it with a key-value pair which defines the group. That way I would be able to amalgamate different grouped datasets with arbitrary levels of grouping. For example I could have
group max_speed
class=bird,order=Falconiformes 389.0
class=bird,order=Psittaciformes 24.0
class=bird 206.5
foo=bar 45.5
...
CodePudding user response:
Very similar to your own suggestion, you can extract the grouped by column names using:
grouped.dtypes.index.names
It is not shorter, but you avoid calling a method.