Home > Back-end >  How to change the structure of a pd DataFrame adding row values to columns?
How to change the structure of a pd DataFrame adding row values to columns?

Time:11-07

I have the following data frame

df = pd.DataFrame({'Date': ['2020-01-01', '2020-10-01', '2021-01-01', '2021-10-01'],
                   'ID': [101, 101, 102, 102],
                   'number': [10, 10, 11, 11]})

# currently looking like this

    Date        ID  number
0   2020-01-01  101 10
1   2020-10-01  101 10
2   2021-01-01  102 11
3   2021-10-01  102 11

Is there a way to restructure the data frame in such a way to have the following form?

    Start Date  End Date    ID  number
0   2020-01-01  2020-10-01  101 10
1   2021-01-01  2021-10-01  102 11

CodePudding user response:

We can use Named Aggregation with Groupby aggregate to get the min and max Date per group and set with new names. reindex is optional to reorder columns to match shown expected output:

new_df = df.groupby(
    ['ID', 'number'], as_index=False
).aggregate(
    **{'Start Date': ('Date', 'min'), 'End Date': ('Date', 'max')}
).reindex(columns=['Start Date', 'End Date', 'ID', 'number'])

new_df:

   Start Date    End Date   ID  number
0  2020-01-01  2020-10-01  101      10
1  2021-01-01  2021-10-01  102      11

*Dictionary unpacking ** is needed to allow for renamed columns to have spaces.

  • Related