Home > Software engineering >  Finding earliest date after groupby a specific column
Finding earliest date after groupby a specific column

Time:11-12

I have a dataframe that look like below.

  id     name   tag   location       date
   1     John    34     FL        01/12/1990
   1     Peter   32     NC        01/12/1990
   1     Dave    66     SC        11/25/1990
   1     Mary    12     CA        03/09/1990
   1     Sue     29     NY        07/10/1990
   1     Eve     89     MA        06/12/1990
   :      :      :      :              :
   n     John    34     FL        01/12/2000
   n     Peter   32     NC        01/12/2000
   n     Dave    66     SC        11/25/1999
   n     Mary    12     CA        03/09/1999
   n     Sue     29     NY        07/10/1998
   n     Eve     89     MA        06/12/1997

I need to find the location information based on the id column but with one condition, only need the earliest date. For example, the earliest date for id=1 group is 01/12/1990, which means the location is FL and NC. Then apply it to all the different id group to get the top 3 locations. I have written the code to do this for me.

#Get the earliest date base on id group
df_ear = df.loc[df.groupby('id')['date'].idxmin()]
#Count the occurancees of the location
df_ear['location'].value_counts()

The code works perfectly fine but it cannot return more than 1 location (using my first line of code) if they have the same earliest date, for example, id=1 group will only return FL instead FL and NC. I am wondering how can I fix my code to include the condition that if the earliest date is more than 1.

Thanks!

CodePudding user response:

Use GroupBy.transform for Series for minimal date per groups, so possible compare by column Date in boolean indexing:

df['date'] = pd.to_datetime(df['date'])

df_ear = df[df.groupby('id')['date'].transform('min').eq(df['date'])]
  • Related