Ho to merge 2 columns containing string dates and None into one column-CodePudding

I've got this sample data frame. Each use has 2 rows.

They have an arrival and a departure date, and one of them is always None. The dates are string.

This is what my data currently looks like:

traveller_id	arrival	departure
282840560712311	2022-10-20	None
282840560712311	None	2022-10-23
439863739170884	2022-12-22	None
439863739170884	None	2022-12-25

    import pandas as pd
    import numpy as np
    
df = pd.DataFrame(data = {'traveller_id': [712311, 712311, 170884, 170884]
                          , 'arrival': ['2022-10-20', None, '2022-12-22', None]
                          , 'departure': [None, '2022-10-23', None, '2022-12-25']
                          })

The goal is to have only 1 row per user, with dates in the other columns (and no None).

It should look like this:

traveller_id	arrival	departure
282840560712311	2022-10-20	2022-10-23
439863739170884	2022-12-22	2022-12-25

CodePudding user response：

Replace None with pd.NaT and then do an agg with max after groupby traveller_id:

df.replace({None:pd.NaT}).groupby('traveller_id', as_index=False).agg(max)

output on your example from constructor:

   traveller_id    arrival  departure
0        170884 2022-12-22 2022-12-25
1        712311 2022-10-20 2022-10-23

I assumed the strings as dates. If they are strings, yes you can convert them as dates first and no need to replace None with pd.NaT.

So with that:

df['arrival'] = pd.to_datetime(df['arrival'])
df['departure'] = pd.to_datetime(df['departure'])
df = df.groupby('traveller_id', as_index=False).agg(max)