My dataset df
looks like this:
main_id time day
1 2019-05-31 1
1 2019-06-30 1
1 2019-05-30 2
2 2018-05-30 1
I want to create a new df df_time
that contains a new column giving the very first value from each combination of main_id
and day
. For example, the output here could look like this:
main_id day first_occuring_time
1 1 2019-05-31
1 2 2019-05-30
2 1 2018-05-30
CodePudding user response:
df.groupby(['main_id', 'day']).agg(lambda x:x.iloc[0]).reset_index()
main_id day time
0 1 1 2019-05-31
1 1 2 2019-05-30
2 2 1 2018-05-30
oe even:
df.groupby(['main_id', 'day']).agg(lambda x:x.head(1)).reset_index()
main_id day time
0 1 1 2019-05-31
1 1 2 2019-05-30
2 2 1 2018-05-30
CodePudding user response:
Based on this answer, you could try in this way:
gr_df = df.groupby(["main_id", "day"])["time"].first().reset_index()
>>> main_id day time
0 1 1 2019-05-31
1 1 2 2019-05-30
2 2 1 2019-05-30
gr_df.rename({"time": "first_occurence"}, axis=1, inplace=True)
CodePudding user response:
You can try:
df.groupby(['main_id', 'day'], as_index=False).agg(
first_occuring_time = pd.NamedAgg(column='time', aggfunc='first'))
Output:
main_id day first_occuring_time
0 1 1 2019-05-31
1 1 2 2019-05-30
2 2 1 2018-05-30