Home > Enterprise >  How to extract group name values at any groupby level in the grouping hierarchy
How to extract group name values at any groupby level in the grouping hierarchy

Time:06-11

I have a Pandas DataFrame df of 15 rows and three columns ['group_A', 'group_B', 'group_C'] denoting groups which are all of size 2 e.g. group_A has possible values a1 and a2. The grouping hierarchy is as follows:

>>display(df.groupby(['group_A','group_B']).size())

group_A  group_B  group_C   row_count
a1       b1       c1            1

         b2       c1            2
                  c2            2

a2       b1       c1            1

         b2       c1            1
                  c2            8

e.g. 8 rows of the DataFrame has group values ['a2', 'b2', 'c2']. No rows have ['a1', 'b1', 'c2'].

I can extract the group-level membership sizes using .size(). So at the final grouping, I can get the row_count values [1,2,2,1,1,8]. How do I also extract the group values at the same grouping level, which I expect to be [c1,c1,c2,c1,c1,c2] in the order they appear in the hierarchy?

CodePudding user response:

size() creates a MultiIndex, which you can convert to an array to extract the group names.

multi_index = df.groupby(["group_A", "group_B", "group_C"]).size().index
arr = np.array(multi_index)
arr = np.apply_along_axis(list, 0, arr)

[In] arr[:, 2]
[Out] array(['c1', 'c1', 'c2', 'c1', 'c1', 'c2'], dtype='<U2')

CodePudding user response:

If you just want the values of the column, then this should do it.

df_grouped = df.groupby(['group_A','group_B']).size()
print(df_grouped['group_C'].values)
  • Related