I have a dataframe that look like below.
id name tag location date
1 John 34 FL 01/12/1990
1 Peter 32 NC 01/12/1990
1 Dave 66 SC 11/25/1990
1 Mary 12 CA 03/09/1990
1 Sue 29 NY 07/10/1990
1 Eve 89 MA 06/12/1990
: : : : :
n John 34 FL 01/12/2000
n Peter 32 NC 01/12/2000
n Dave 66 SC 11/25/1999
n Mary 12 CA 03/09/1999
n Sue 29 NY 07/10/1998
n Eve 89 MA 06/12/1997
I need to find the location information based on the id column but with one condition, only need the earliest date. For example, the earliest date for id=1 group is 01/12/1990, which means the location is FL and NC. Then apply it to all the different id group to get the top 3 locations. I have written the code to do this for me.
#Get the earliest date base on id group
df_ear = df.loc[df.groupby('id')['date'].idxmin()]
#Count the occurancees of the location
df_ear['location'].value_counts()
The code works perfectly fine but it cannot return more than 1 location (using my first line of code) if they have the same earliest date, for example, id=1 group will only return FL instead FL and NC. I am wondering how can I fix my code to include the condition that if the earliest date is more than 1.
Thanks!
CodePudding user response:
Use GroupBy.transform
for Series for minimal date per groups, so possible compare by column Date
in boolean indexing
:
df['date'] = pd.to_datetime(df['date'])
df_ear = df[df.groupby('id')['date'].transform('min').eq(df['date'])]