I have attributes like color, size etc that I want to group by the start_date and end_date and aggregate by id and attribute values
id color size start_date end_date
A1 blue m 1/1/2022 3/1/2022
A1 blue l 3/1/2022 5/1/2022
A1 yellow l 5/1/2022 NaN
A1 blue 1/1/2022 5/1/2022
A1 yellow 5/1/2022 NaN
A1 m 1/1/2022 3/1/2022
A1 l 3/1/2022 NaN
CodePudding user response:
The default .groupby
method will ignore nans
, but passing dropna=False
will solve it:
for group_label, group_df in df.groupby(['start_date', 'end_date'], dropna=False):
...
CodePudding user response:
df.fillna('NaN').groupby('size').agg({'id':'first', 'start_date':'first', 'end_date':'last'}).reset_index()
size id start_date end_date
0 l A1 3/1/2022 NaN
1 m A1 1/1/2022 3/1/2022
df.fillna('NaN').groupby('color').agg({'id':'first', 'start_date':'first', 'end_date':'last'}).reset_index()
color id start_date end_date
0 blue A1 1/1/2022 5/1/2022
1 yellow A1 5/1/2022 NaN