I just want to know how it is possible to use this command df.groupby(['column1', 'column2', 'column3']).size()
with a variable amount of columns.
CodePudding user response:
If need grouping by all columns of df
use:
df.groupby(df.columns.tolist()).size()
If need grouping by all columns without ID
use Index.difference
:
df.groupby(df.columns.difference(['ID'], sort=False).tolist()).size()
CodePudding user response:
Given you previous question, you likely want to group by all columns except the ID (otherwise you would get only groups with single items):
cols = df.columns.drop(['ID']).tolist()
df.groupby(cols).size()
NB. You can add in the list any other column name that needs to be excluded
CodePudding user response:
You can give groupby a list of columns.
grouper = ['column1', 'column2', 'column3']
df.groupby(grouper).size()
There is no difference between instantiating the list to give it as a function parameter or assigning it to a variable and using that in the function. You can change the columns of grouper dynamically.