I have below data frame t:
import pandas as pd
t = pd.DataFrame(data = (['AFG','Afghanistan',38928341],
['CHE','Switzerland',8654618],
['SMR','San Marino', 33938]), columns = ['iso_code', 'location', 'population'])
g = t.groupby('location')
g.size()
I can see in each group there's only one record, which is expected.
However if I run below code it didn't populate any error message:
g.first(10)
It shows
population
location
Afghanistan 38928341
San Marino 33938
Switzerland 8654618
My understanding is the first(n) for a group is the nth record for this group but each of my location group has only one record - so how did pandas give me that record?
Thanks
CodePudding user response:
I think you're looking for g.nth(10)
.
g.first(10)
is NOT doing what you think it is. The first (optional) parameter of first
is numeric_only
and takes a boolean, so you're actually running g.first(numeric_only=True)
as bool(10)
evaluates to True
.
CodePudding user response:
After read the comments from mozway and Henry Ecker/ sammywemmy I finally got it.
t = pd.DataFrame(data = (['AFG','Afghanistan',38928341,'A1'],
['CHE','Switzerland',8654618,'C1'],
['SMR','San Marino', 33938,'S1'],
['AFG','Afghanistan',38928342,'A2'] ,
['AFG','Afghanistan',38928343, 'A3'] ), columns = ['iso_code', 'location', 'population', 'code'])
g = t.groupby('location')
Then
g.nth(0)
g.nth(1)
g.first(True)
g.first(False)
g.first(min_countint=2)
shows the difference