Home > database >  Dataframe date sorting is reversed. How to fix it?
Dataframe date sorting is reversed. How to fix it?

Time:06-06

So, I have a dataframe (mean_df) with a very messy column with dates. It's messy because it is in this format: 1/1/2018, 1/2/2018, 1/3/2018.... When it should be 01/01/2018, 02/01/2018, 03/01/2018... Not only has the wrong format, but it's ascending by the first day of every month, and then following second day of every month, and so on... So I wrote this code to fix the format:

mean_df["Date"] = mean_df["Date"].astype('datetime64[ns]')
mean_df["Date"] = mean_df["Date"].dt.strftime('%d-%m-%Y')

Then, from displaying this:

wrong format

It's now showing this (I have to run the same cell 3 times to make it work, it always throws error the first time):

enter image description here

Finally, in the last few hours I've been trying to sort the 'Dates' column, in an ascending way, but it keeps sorting it the wrong way:

mean_df = mean_df.sort_values(by='Date') # I tried this

But this is the output:

enter image description here

As you can see, it is still ascending prioritizing days. Can someone guide me in the right direction? Thank you in advance!

CodePudding user response:

Make it into right format

mean_df["sort_date"] = pd.to_datetime(mean_df["Date"],format = '%d/%m/%Y')
mean_df = mean_df.sort_values(by='sort_date') # Try this now

CodePudding user response:

You should sort the date just after convert it to datetime since dt.strftime convert datetime to string

mean_df["Date"] = pd.to_datetime(mean_df["Date"], dayfirst=True)
mean_df = mean_df.sort_values(by='Date')
mean_df["Date"] = mean_df["Date"].dt.strftime('%d-%m-%Y')

CodePudding user response:

Here is my sample code.

import pandas as pd
df = pd.DataFrame()
df['Date'] = "1/1/2018, 1/2/2018, 1/3/2018".split(", ")
df['Date1'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['Date2'] = df['Date1'].dt.strftime('%d/%m/%Y')
df.sort_values(by='Date2')

First, I convert Date to datetime format. As I observed, you data follows '%d/%m/%Y' format. If you want to show data in another form, try the following line, for example

df['Date2'] = df['Date1'].dt.strftime('%d/%m/%Y')
  • Related