I have the following data frame
df = pd.DataFrame({'Date': ['2020-01-01', '2020-10-01', '2021-01-01', '2021-10-01'],
'ID': [101, 101, 102, 102],
'number': [10, 10, 11, 11]})
# currently looking like this
Date ID number
0 2020-01-01 101 10
1 2020-10-01 101 10
2 2021-01-01 102 11
3 2021-10-01 102 11
Is there a way to restructure the data frame in such a way to have the following form?
Start Date End Date ID number
0 2020-01-01 2020-10-01 101 10
1 2021-01-01 2021-10-01 102 11
CodePudding user response:
We can use Named Aggregation with Groupby aggregate to get the min
and max
Date per group and set with new names. reindex
is optional to reorder columns to match shown expected output:
new_df = df.groupby(
['ID', 'number'], as_index=False
).aggregate(
**{'Start Date': ('Date', 'min'), 'End Date': ('Date', 'max')}
).reindex(columns=['Start Date', 'End Date', 'ID', 'number'])
new_df
:
Start Date End Date ID number
0 2020-01-01 2020-10-01 101 10
1 2021-01-01 2021-10-01 102 11
*Dictionary unpacking **
is needed to allow for renamed columns to have spaces.