How to use pandas groupby to an unknown amout of columns?-CodePudding

I just want to know how it is possible to use this command df.groupby(['column1', 'column2', 'column3']).size() with a variable amount of columns.

CodePudding user response：

If need grouping by all columns of df use:

df.groupby(df.columns.tolist()).size()

If need grouping by all columns without ID use Index.difference:

df.groupby(df.columns.difference(['ID'], sort=False).tolist()).size()

CodePudding user response：

Given you previous question, you likely want to group by all columns except the ID (otherwise you would get only groups with single items):

cols = df.columns.drop(['ID']).tolist()
df.groupby(cols).size()

NB. You can add in the list any other column name that needs to be excluded

CodePudding user response：

You can give groupby a list of columns.

grouper = ['column1', 'column2', 'column3']

df.groupby(grouper).size()

There is no difference between instantiating the list to give it as a function parameter or assigning it to a variable and using that in the function. You can change the columns of grouper dynamically.