Home > Software engineering >  How to get grouped dataframe?
How to get grouped dataframe?

Time:05-02

I have a simple dataframe:

import pandas as pd
import numpy  as np

df = pd.DataFrame(columns = ['name', 'last', 'test_num', 'grade'])


df = df.append({'name': 'name_a', 
                'last':  'last_a',
                'test_num': 1,
                'grade':  90},
              ignore_index=True)

df = df.append({'name': 'name_a', 
                'last':  'last_a',
                'test_num': 2,
                'grade':  100},
              ignore_index=True)


df = df.append({'name': 'name_a', 
                'last':  'last_a',
                'test_num': 3,
                'grade':  95},
              ignore_index=True)


df = df.append({'name': 'name_a', 
                'last':  'last_b',
                'test_num': 1,
                'grade':  50},
              ignore_index=True)

df = df.append({'name': 'name_a', 
                'last':  'last_b',
                'test_num': 2,
                'grade':  55},
              ignore_index=True)


df = df.append({'name': 'name_b', 
                'last':  'last_b',
                'test_num': 1,
                'grade':  90},
              ignore_index=True)

df = df.append({'name': 'name_b', 
                'last':  'last_b',
                'test_num': 2,
                'grade':  100},
              ignore_index=True)




df.head(10)

output:

    name    last    test_num    grade
0   name_a  last_a  1   90
1   name_a  last_a  2   100
2   name_a  last_a  3   95
3   name_a  last_b  1   50
4   name_a  last_b  2   55
5   name_b  last_b  1   90
6   name_b  last_b  2   100

I want to create a new dataframe with the following values:

    name    last
0   name_a  last_a
1   name_a  last_b
2   name_b  last_b

I have tried to use groupby:

df2 = df.groupby(['name', 'last'])['name', 'last']

but the return result is pandas.core.groupby.generic.DataFrameGroupBy

How can I get the output I want as pandas.core.frame.DataFrame ?

CodePudding user response:

You can use nth(0), head(1), tail(1), first() or last() to get one row in groupby object

df2 = df.groupby(['name', 'last'], as_index=False)[['name', 'last']].nth(0)
print(df2)

     name    last
0  name_a  last_a
3  name_a  last_b
5  name_b  last_b

CodePudding user response:

You can try to concat your grouped dataframe to convert it to DataFrame

df3 = pd.concat(dict(iter(df2)).values())
  • Related