I'm working with a dataset in Python that has three columns: the state (ST), the city (City), and the number of certificates (CERT). I have to find the minimum number of certificates for each state which I have done with the following code:
df.groupby(["ST"])["CERT"].min()
but this returns only the state and the number of certificates when I also need the city to be outputted. I need the city in each state that has the minimum number of certificates. When I run the following code:
df.groupby(["ST", "City"])["CERT"].min()
I am getting the minimum for every single city, when I need the minimum for each state with the city that that minimum belongs to. Is there a way to include the City column outside of the .groupby function?
CodePudding user response:
You could use groupby
idxmin
to get the index of the minimum values, then filter df
with it using loc
:
index_of_min_certificates = df.groupby('ST')['CERT'].idxmin()
out = df.loc[index_of_min_certificates]
For example, for df
like below:
ST City CERT
0 CA LA 0
1 CA SF 1
2 NY NYC 2
3 NY Albany 3
the above code produces:
ST City CERT
0 CA LA 0
2 NY NYC 2