all. I don't know if this is ultimately a sort_values
or groupby
question.
I have a simple DataFrame with 5 columns: A, B, C, D, E
I'd like to group on A, B, C and have D and E in the results.
I'd like to do a groupby(['A','B','C])
and display the DataFrame in sections as illustrated below.
The closest I could get was doing df2 = df.groupby(['A','B','C']).apply(list)
which produced the following:
A B C
----------------------------------------------
fooA1 fooB1 fooC1 [A, B, C, D, E]
fooB2 fooC2 [A, B, C, D, E]
fooC3 [A, B, C, D, E]
fooB3 fooC4 [A, B, C, D, E]
fooC5 [A, B, C, D, E]
fooB4 fooC6 [A, B, C, D, E]
fooC7 [A, B, C, D, E]
fooA2 ...
which is really close to what I want, except here [A, B, C, D, E]
was literally [A, B, C, D, E]
; i.e the column names and not any actual values from those columns.
I was able to use sort_values
like this to produce a report:
df.sort_values(by=['A','B','C'], inplace=True)
df = df.reset_index()
for index, row in df.iterrows():
print(f"{row['A']} {row['B']} {row['C']} {row['D']} {row['E']}")
which produces the following:
fooA1 fooB1 fooC1 fooD... fooE...
fooA1 fooB2 fooC2 fooD... fooE...
fooA1 fooB2 fooC3 fooD... fooE...
fooA1 fooB3 fooC4 fooD... fooE...
fooA1 fooB3 fooC5 fooD... fooE...
fooA1 fooB4 fooC6 fooD... fooE...
fooA1 fooB4 fooC7 fooD... fooE...
fooA2 ...
which is good enough for what I ultimately need, but I'd really like to understand where I was going wrong with the groupby
, and how I could produce a cleaner looking report.
Thank you so much!
CodePudding user response:
You are using it right df2 = df.groupby(['A','B','C']).apply(list)
but instead of .apply(list)
, use : .agg(lambda x: list(x))
df2 = df.groupby(['A','B','C']).agg(lambda x: list(x))
This will get for all grouped values A B C, values from rows D and E in a list respectively.