Home > database >  How can we perform group by single column without aggregation in pandas?
How can we perform group by single column without aggregation in pandas?

Time:09-21

I have a couple of questions about the group by function. 1. I would like to group by pandas data frame by single column without aggregation. 2. After group by, I would like to split the dataset into several datasets by the month date. So, I wasn't able to do so, I am requesting help. I would appreciate it if you guys can help me. I have provided the code, expected results, and dataset below.

Original dataframe

data = {'month': ['2022-01-01', '2022-02-01', '2022-03-01', '2022-01-01', '2022-02-01', '2022-03-01', '2022-01-01', '2022-02-01', '2022-03-01',], 
'Name': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'], 
'num': [1234, 1234, 1234, 456, 456, 456, 456, 100, 200,],
}

df = pd.DataFrame(data)
df

enter image description here

Expected Result for question #1

enter image description here

And I want to split the dataset into different datasets like this Expected Result for question #2 enter image description here

Thank You

CodePudding user response:

try:

df.sort_values(['month', 'Name'])
    month       Name    num
0   2022-01-01  A       1234
3   2022-01-01  B       456
6   2022-01-01  C       456
1   2022-02-01  A       1234
4   2022-02-01  B       456
7   2022-02-01  C       100
2   2022-03-01  A       1234
5   2022-03-01  B       456
8   2022-03-01  C       200

D = []
for i in df['month'].unique():
    print(i)
    D.append(df.loc[df['month'].eq(i)])

#D is now list of separate dataframes

D[0]
    month       Name    num
0   2022-01-01  A       1234
3   2022-01-01  B       456
6   2022-01-01  C       456

D[1]
    month       Name    num
1   2022-02-01  A       1234
4   2022-02-01  B       456
7   2022-02-01  C       100

type(D[2])
pandas.core.frame.DataFrame

#-----
if you want to sort by only one column, then you need to create this one column as string:
df['monthName'] = df['month']   df['Name']
df.sort_values('monthName')
    month       Name    num     monthName
0   2022-01-01  A       1234    2022-01-01A
3   2022-01-01  B       456     2022-01-01B
6   2022-01-01  C       456     2022-01-01C
1   2022-02-01  A       1234    2022-02-01A
4   2022-02-01  B       456     2022-02-01B
7   2022-02-01  C       100     2022-02-01C
2   2022-03-01  A       1234    2022-03-01A
5   2022-03-01  B       456     2022-03-01B
8   2022-03-01  C       200     2022-03-01C

you can drop this column as long as you sort:
df.sort_values('monthName').drop(columns='monthName')

CodePudding user response:

  1. Sort the dataframe using sort_values:

    df.sort_values('month')

  2. Use groupby and assign result to a dictionary like this:

    dict(tuple(df.groupby('month')))

You can access each separated part of a the dataframe using the month as the dictionary key.

  • Related