Home > Blockchain >  Python pandas group rows by date
Python pandas group rows by date

Time:06-09

I have attributes like color, size etc that I want to group by the start_date and end_date and aggregate by id and attribute values

id color  size start_date end_date
A1 blue   m    1/1/2022   3/1/2022
A1 blue   l    3/1/2022   5/1/2022
A1 yellow l    5/1/2022   NaN
A1 blue   1/1/2022 5/1/2022
A1 yellow 5/1/2022 NaN
A1 m 1/1/2022 3/1/2022
A1 l 3/1/2022 NaN

CodePudding user response:

The default .groupby method will ignore nans, but passing dropna=False will solve it:

for group_label, group_df in df.groupby(['start_date', 'end_date'], dropna=False):
    ...

CodePudding user response:

df.fillna('NaN').groupby('size').agg({'id':'first', 'start_date':'first', 'end_date':'last'}).reset_index()
 
  size  id start_date  end_date
0    l  A1   3/1/2022       NaN
1    m  A1   1/1/2022  3/1/2022

df.fillna('NaN').groupby('color').agg({'id':'first', 'start_date':'first', 'end_date':'last'}).reset_index()
 
    color  id start_date  end_date
0    blue  A1   1/1/2022  5/1/2022
1  yellow  A1   5/1/2022       NaN
  • Related