Home > Back-end >  sorting the pandas dataframe with two date-time columns on similar primary key
sorting the pandas dataframe with two date-time columns on similar primary key

Time:08-10

Initial dataframe looks like :

Primary Key     date_1        date_2
   22         2021-11-20      NaN
   22         2021-11-24      NaN
   22         NaN             2021-11-23
   22         NaN             2021-11-22 

Desired Dataframe is :

  Primary Key   date_1        date_2
   22         2021-11-20      NaN
   22         NaN             2021-11-22
   22         NaN             2021-11-23
   22         2021-11-24      NaN

trying to sort on the basis of date column

df.sort_values(['Primary Key', 'date_1', 'date_2'],inplace=True) #how to approach sorting

CodePudding user response:

You can use a fillna operation to combine your columns into a single Series, sort that Series and then use its index to re-order your original DataFrame:

out = df.reindex(
    df['date_1'].fillna(df['date_2'])
    .sort_values().index
)

print(out)
   Primary Key      date_1      date_2
0           22  2021-11-20         NaN
3           22         NaN  2021-11-22
2           22         NaN  2021-11-23
1           22  2021-11-24         NaN

If you have a concern about your Index, you can also use .reset_index and .iloc to perform this sorting

df = df.set_index('Primary Key')

indices = (
    df['date_1'].fillna(df['date_2'])
        .reset_index(drop=True)
        .sort_values().index
)

out = df.iloc[indices]
print(out)
                 date_1      date_2
Primary Key                        
22           2021-11-20         NaN
22                  NaN  2021-11-22
22                  NaN  2021-11-23
22           2021-11-24         NaN

CodePudding user response:

Here is one way - you basically take the flat list of two columns and argsort to get the indexes but take only the first four after normalizing the values to the length of the dataframe.

idx = (np.argsort(np.ravel([df['date_1'], df['date_2']])) % len(df))[:len(df)]
out = df.loc[idx]

print(out):

   Primary Key     date_1     date_2
0           22 2021-11-20        NaT
3           22        NaT 2021-11-22
2           22        NaT 2021-11-23
1           22 2021-11-24        NaT
  • Related