Home > Enterprise >  Selecting columns outside of .groupby()
Selecting columns outside of .groupby()

Time:04-05

I'm working with a dataset in Python that has three columns: the state (ST), the city (City), and the number of certificates (CERT). I have to find the minimum number of certificates for each state which I have done with the following code:

df.groupby(["ST"])["CERT"].min()

but this returns only the state and the number of certificates when I also need the city to be outputted. I need the city in each state that has the minimum number of certificates. When I run the following code:

df.groupby(["ST", "City"])["CERT"].min()

I am getting the minimum for every single city, when I need the minimum for each state with the city that that minimum belongs to. Is there a way to include the City column outside of the .groupby function?

CodePudding user response:

You could use groupby idxmin to get the index of the minimum values, then filter df with it using loc:

index_of_min_certificates = df.groupby('ST')['CERT'].idxmin()
out = df.loc[index_of_min_certificates]

For example, for df like below:

   ST    City  CERT
0  CA      LA     0
1  CA      SF     1
2  NY     NYC     2
3  NY  Albany     3

the above code produces:

   ST City  CERT
0  CA   LA     0
2  NY  NYC     2
  • Related