Home > OS >  How to remove only consistent duplicate rows, and replace values based on another columns with panda
How to remove only consistent duplicate rows, and replace values based on another columns with panda

Time:08-11

import pandas as pd

df = pd.DataFrame(
    [
        ['China', '08/06/2022 20:00', '08/10/2022 20:00'],
        ['China', '8/13/2022 00:54', '8/14/2022 00:54'],
        ['China', '8/14/2022 00:54', '8/14/2022 12:54'],
        ['United Kingdom', '8/27/2022 06:36', '8/31/2022 21:08'],
        ['United Kingdom', '9/01/2022 21:08', '09/02/2022 21:38'],
        ['China', '09/04/2022 21:38', '09/06/2022 21:38']
    ],
    columns=['Country', 'Arrival', 'Departure']
)

Supposed i have above dataframe, and i want to remove the consistent duplicate rows and replace the departure time with the last duplicates value, but do not remove other duplicates that are not in consistent manner.

so after removing it should look like this:

df = pd.DataFrame(
    [
        ['China', '08/06/2022 20:00', '8/14/2022 12:54'],
        ['United Kingdom', '8/27/2022 06:36', '09/02/2022 21:38'],
        ['China', '09/04/2022 21:38', '09/06/2022 21:38']
    ]
    columns=['Country', 'Arrival', 'Departure']
)

CodePudding user response:

You can try

out = (df.groupby(df['Country'].ne(df['Country'].shift()).cumsum(), group_keys=False)
       .apply(lambda g: pd.DataFrame(
           [[g['Country'].iloc[0], g['Arrival'].iloc[0], g['Departure'].iloc[-1]]]
           , columns=g.columns)) # create one row dataframe from group where Arrival is first value and Departure is last value
       .reset_index(drop=True))
print(out)

          Country           Arrival         Departure
0           China  08/06/2022 20:00   8/14/2022 12:54
1  United Kingdom   8/27/2022 06:36  09/02/2022 21:38
2           China  09/04/2022 21:38  09/06/2022 21:38

CodePudding user response:

IIUC, use:

group = df['Country'].ne(df['Country'].shift()).cumsum()
out = (df.groupby(group, as_index=False)
         .agg({'Country': 'first', 'Arrival': 'first', 'Departure': 'last'})
       )

output:

          Country           Arrival         Departure
0           China  08/06/2022 20:00   8/14/2022 12:54
1  United Kingdom   8/27/2022 06:36  09/02/2022 21:38
2           China  09/04/2022 21:38  09/06/2022 21:38
  • Related