Home > Blockchain >  Pandas DataFrame: Groupby.First - Index limitation?
Pandas DataFrame: Groupby.First - Index limitation?

Time:10-12

I have below data frame t:

import pandas as pd
t = pd.DataFrame(data = (['AFG','Afghanistan',38928341],
                 ['CHE','Switzerland',8654618],
                ['SMR','San Marino', 33938]), columns = ['iso_code', 'location', 'population'])

g = t.groupby('location')
g.size()

I can see in each group there's only one record, which is expected.

However if I run below code it didn't populate any error message:

g.first(10)

It shows

    population
location    
Afghanistan 38928341
San Marino  33938
Switzerland 8654618

My understanding is the first(n) for a group is the nth record for this group but each of my location group has only one record - so how did pandas give me that record?

Thanks

CodePudding user response:

I think you're looking for g.nth(10).

g.first(10) is NOT doing what you think it is. The first (optional) parameter of first is numeric_only and takes a boolean, so you're actually running g.first(numeric_only=True) as bool(10) evaluates to True.

CodePudding user response:

After read the comments from mozway and Henry Ecker/ sammywemmy I finally got it.

t = pd.DataFrame(data = (['AFG','Afghanistan',38928341,'A1'],
                 ['CHE','Switzerland',8654618,'C1'],
                ['SMR','San Marino', 33938,'S1'],
                 ['AFG','Afghanistan',38928342,'A2'] ,
                         ['AFG','Afghanistan',38928343, 'A3']  ), columns = ['iso_code', 'location', 'population', 'code'])
g = t.groupby('location')

Then

g.nth(0)
g.nth(1)
g.first(True)
g.first(False)
g.first(min_countint=2)

shows the difference

  • Related