Initial dataframe looks like :
Primary Key date_1 date_2
22 2021-11-20 NaN
22 2021-11-24 NaN
22 NaN 2021-11-23
22 NaN 2021-11-22
Desired Dataframe is :
Primary Key date_1 date_2
22 2021-11-20 NaN
22 NaN 2021-11-22
22 NaN 2021-11-23
22 2021-11-24 NaN
trying to sort on the basis of date column
df.sort_values(['Primary Key', 'date_1', 'date_2'],inplace=True) #how to approach sorting
CodePudding user response:
You can use a fillna
operation to combine your columns into a single Series
, sort that Series
and then use its index to re-order your original DataFrame
:
out = df.reindex(
df['date_1'].fillna(df['date_2'])
.sort_values().index
)
print(out)
Primary Key date_1 date_2
0 22 2021-11-20 NaN
3 22 NaN 2021-11-22
2 22 NaN 2021-11-23
1 22 2021-11-24 NaN
If you have a concern about your Index
, you can also use .reset_index
and .iloc
to perform this sorting
df = df.set_index('Primary Key')
indices = (
df['date_1'].fillna(df['date_2'])
.reset_index(drop=True)
.sort_values().index
)
out = df.iloc[indices]
print(out)
date_1 date_2
Primary Key
22 2021-11-20 NaN
22 NaN 2021-11-22
22 NaN 2021-11-23
22 2021-11-24 NaN
CodePudding user response:
Here is one way - you basically take the flat list of two columns and argsort to get the indexes but take only the first four after normalizing the values to the length of the dataframe.
idx = (np.argsort(np.ravel([df['date_1'], df['date_2']])) % len(df))[:len(df)]
out = df.loc[idx]
print(out):
Primary Key date_1 date_2
0 22 2021-11-20 NaT
3 22 NaT 2021-11-22
2 22 NaT 2021-11-23
1 22 2021-11-24 NaT