Am trying to group by based on time freq for a dataframe. Can I get all the columns instead of just the specified columns in the group by.
code:
df.columns = ['time', 'age', 'salary', 'amount','university', 'gender', 'place', 'education']
DF:
time age salary amount university gender place education
12/6/2021 24 33333 232323 SK M US BE
12/6/2021 24 33333 232323 SK M US BE
12/8/2021 30 23656 9496 SE F UK BARC
12/9/2021 34 65652 26266 DE M UK BTECH
12/6/2021 25 89893 2652 NK F GER BSC
12/6/2021 25 89893 2652 NK F GER BSC
12/8/2021 70 445464 78989 SE F UK BARC
12/9/2021 45 65656 225415 NK F GER BTECH
12/6/2021 29 5996 3232 NK M CAN BTECH
full_data = data.groupby([pd.Grouper(key='time', freq='4min'),'age', 'salary', 'amount','university']).size().reset_index(name='counts')
Expected:
time age salary amount university gender place education counts
12/6/2021 24 33333 232323 SK M US BE 2
12/8/2021 30 23656 9496 SE F UK BARC 1
12/9/2021 34 65652 26266 DE M UK BTECH 1
12/6/2021 25 89893 2652 NK F GER BSC 2
12/8/2021 70 445464 78989 SE F UK BARC 1
12/9/2021 45 65656 225415 NK F GER BTECH 1
12/6/2021 29 5996 3232 NK M CAN BTECH 1
The result of the above code has only 5 columns. Is there a way to get all the columns
CodePudding user response:
First idea is create new column by counts and then remove duplciates by some columns, e.g. :
data['counts'] = data.groupby([pd.Grouper(key='time', freq='4min'),'age', 'salary', 'amount','university'])['age'].transform('size')
df = data.drop_duplicates(['age', 'salary', 'amount','university'])
Of use all columns if possible same values per groups:
full_data = data.groupby([pd.Grouper(key='time', freq='4min'),'age', 'salary', 'amount','university', 'gender', 'place', 'education']).size().reset_index(name='counts')