I have a Pandas DataFrame df
of 15 rows and three columns ['group_A', 'group_B', 'group_C']
denoting groups which are all of size 2 e.g. group_A
has possible values a1
and a2
. The grouping hierarchy is as follows:
>>display(df.groupby(['group_A','group_B']).size())
group_A group_B group_C row_count
a1 b1 c1 1
b2 c1 2
c2 2
a2 b1 c1 1
b2 c1 1
c2 8
e.g. 8 rows of the DataFrame has group values ['a2', 'b2', 'c2']
. No rows have ['a1', 'b1', 'c2']
.
I can extract the group-level membership sizes using .size()
. So at the final grouping, I can get the row_count
values [1,2,2,1,1,8]
. How do I also extract the group values at the same grouping level, which I expect to be [c1,c1,c2,c1,c1,c2]
in the order they appear in the hierarchy?
CodePudding user response:
size() creates a MultiIndex, which you can convert to an array to extract the group names.
multi_index = df.groupby(["group_A", "group_B", "group_C"]).size().index
arr = np.array(multi_index)
arr = np.apply_along_axis(list, 0, arr)
[In] arr[:, 2]
[Out] array(['c1', 'c1', 'c2', 'c1', 'c1', 'c2'], dtype='<U2')
CodePudding user response:
If you just want the values of the column, then this should do it.
df_grouped = df.groupby(['group_A','group_B']).size()
print(df_grouped['group_C'].values)