I have a dataframe such as:
data | more data | 2010-01-01 | 2011-01-01 |
---|---|---|---|
abc | xyz | 123 | 456 |
xyz | abc | 123 | 456 |
and I would like to sort the columns in a certain order. I tried using sort_values and passing a list with the order I want them in but it is not recognizing '2010-01-01' or any other dates in the list. I assume it is because they are in date format, so how would I sort these labels?
What I currently am trying:
order = ['more data', 'data', '2011-01-01', '2010-01-01']
df = df.sort_values(order)
CodePudding user response:
I'm not pretty sure how you want to handle non-datetime columns. I assume that you prefer these columns to show at the beginning of the dataframe. With that in mind, I suggest you use sorted
function:
import time
import datetime
def sort_date(value):
try:
return -1 * time.mktime(datetime.datetime.strptime(value, "%Y-%m-%d").timetuple())
except:
return float("-inf")
columns = ['data', '2011-01-01', "value", '2010-01-01']
sorted(columns, key=sort_date)
The abovementioned code has nothing to do with pandas
dataframe - rather, gives you a better idea of how to handle such a situation. The sorted
function mentioned returns the following output:
['data', 'value', '2011-01-01', '2010-01-01']
Therefore, you can simply use the returned value of the abovementioned code to sort your columns out:
df[sorted(columns, key=sort_date)]
Note that you can access the columns of the dataframe by simply using df.columns
.
CodePudding user response:
If you only want to re-sort the date columns and not any of the others you have to write a bit more code. Something like:
# Separate date columns from non-date columns
non_date_cols = ['name']
date_cols = [col for col in df.columns if col not in non_date_cols] # Get a list of the date column names
# Sort the column names (not the actual columns)
date_cols.sort()
# Rebuild your dataframe by taking the non date cols and joining the rest of the columns in the order specified
df_sorted = df[non_date_cols date_cols]