get first value from each group of 2 columns-CodePudding

My dataset df looks like this:

main_id    time               day      
1          2019-05-31         1        
1          2019-06-30         1        
1          2019-05-30         2  
2          2018-05-30         1

I want to create a new df df_time that contains a new column giving the very first value from each combination of main_id and day. For example, the output here could look like this:

main_id     day      first_occuring_time
1           1        2019-05-31
1           2        2019-05-30
2           1        2018-05-30

CodePudding user response：

df.groupby(['main_id', 'day']).agg(lambda x:x.iloc[0]).reset_index()

   main_id  day        time
0        1    1  2019-05-31
1        1    2  2019-05-30
2        2    1  2018-05-30

oe even:

df.groupby(['main_id', 'day']).agg(lambda x:x.head(1)).reset_index()
 
   main_id  day        time
0        1    1  2019-05-31
1        1    2  2019-05-30
2        2    1  2018-05-30

CodePudding user response：

Based on this answer, you could try in this way:

gr_df = df.groupby(["main_id", "day"])["time"].first().reset_index()

>>>            main_id day   time
         0        1    1  2019-05-31
         1        1    2  2019-05-30
         2        2    1  2019-05-30

gr_df.rename({"time": "first_occurence"}, axis=1, inplace=True)

CodePudding user response：

You can try:

df.groupby(['main_id', 'day'], as_index=False).agg(
                 first_occuring_time = pd.NamedAgg(column='time', aggfunc='first'))

Output:

   main_id  day first_occuring_time
0        1    1          2019-05-31
1        1    2          2019-05-30
2        2    1          2018-05-30