How to group a DataFrame by some columns and present selected columns in the output-CodePudding

all. I don't know if this is ultimately a sort_values or groupby question.

I have a simple DataFrame with 5 columns: A, B, C, D, E

I'd like to group on A, B, C and have D and E in the results.

I'd like to do a groupby(['A','B','C]) and display the DataFrame in sections as illustrated below.

The closest I could get was doing df2 = df.groupby(['A','B','C']).apply(list) which produced the following:

A       B            C
----------------------------------------------
fooA1   fooB1        fooC1     [A, B, C, D, E]
        fooB2        fooC2     [A, B, C, D, E]
                     fooC3     [A, B, C, D, E]
        fooB3        fooC4     [A, B, C, D, E]
                     fooC5     [A, B, C, D, E]
        fooB4        fooC6     [A, B, C, D, E]
                     fooC7     [A, B, C, D, E]
fooA2   ...

which is really close to what I want, except here [A, B, C, D, E] was literally [A, B, C, D, E]; i.e the column names and not any actual values from those columns.

I was able to use sort_values like this to produce a report:

df.sort_values(by=['A','B','C'], inplace=True)
df = df.reset_index()
for index, row in df.iterrows():
    print(f"{row['A']} {row['B']} {row['C']} {row['D']} {row['E']}")

which produces the following:

fooA1  fooB1  fooC1  fooD...  fooE...
fooA1  fooB2  fooC2  fooD...  fooE...
fooA1  fooB2  fooC3  fooD...  fooE...
fooA1  fooB3  fooC4  fooD...  fooE...
fooA1  fooB3  fooC5  fooD...  fooE...
fooA1  fooB4  fooC6  fooD...  fooE...
fooA1  fooB4  fooC7  fooD...  fooE...
fooA2  ...

which is good enough for what I ultimately need, but I'd really like to understand where I was going wrong with the groupby, and how I could produce a cleaner looking report.

Thank you so much!

CodePudding user response：

You are using it right df2 = df.groupby(['A','B','C']).apply(list)

but instead of .apply(list) , use : .agg(lambda x: list(x))

df2 = df.groupby(['A','B','C']).agg(lambda x: list(x))

This will get for all grouped values A B C, values from rows D and E in a list respectively.