I have a dataframe with a column containing a list of dates:
data = [
[
1,
[
"2017-12-06",
"2017-12-05",
"2017-12-06",
"2018-01-03",
"2018-01-04",
"2017-11-24",
],
],
[
2,
[
"2019-03-10",
"2018-12-03",
"2018-12-04",
"2018-11-08",
"2018-11-30",
"2019-03-22",
"2018-11-24",
"2019-03-06",
"2017-11-16",
],
],
]
df = pd.DataFrame(data, columns=["id", "dates"])
df
id dates
1 [2017-12-06, 2017-12-05, 2017-12-06, 2018-01-03, 2018-01-04, 2017-11-24]
2 [2019-03-10, 2018-12-03, 2018-12-04, 2018-11-08, 2018-11-30, 2019-03-22, 2018-11-24, 2019-03-06, 2017-11-16]
print(df.dtypes)
id int64
dates object
dtype: object
I would like to sort the date containing column (dates
). I have tried a number of methods with no success (including .apply(list.sort) in place
). The only method that I've found that works is using .sort(key = ....)
like below:
import datetime
from datetime import datetime
dates = [
"2019-03-10",
"2018-12-03",
"2018-12-04",
"2018-11-08",
"2018-11-30",
"2019-03-22",
"2018-11-24",
"2019-03-06",
"2017-11-16",
]
dates.sort(key=lambda date: datetime.strptime(date, "%Y-%m-%d"))
but I can only get it to work on a list and I want to apply this to that entire column in the dataframe df
. Can anyone advise the best way to do this? Or perhaps there is an even better way to sort this column?
CodePudding user response:
What I see here is that you want the list in every row to be sorted (not the column itself).
The code below applies a certain function (something like my_sort()
) to each row of column "dates":
df['dates'].apply(my_sort)
You just need to implement my_sort
to be applied to the list in each row. Something like:
def my_sort(dates):
dates.sort(key=lambda date: datetime.strptime(date, "%Y-%m-%d"))
return dates
CodePudding user response:
You can use .apply() to apply a given function (in this case 'sort') to every row of a dataframe column.
This should work:
df['dates'].apply(lambda row: row.sort(key=lambda date: datetime.strptime(date, "%Y-%m-%d")))
print(df)
returns:
id dates
0 1 ['2017-11-24', '2017-12-05', '2017-12-06', '2017-12-06', '2018-01-03', '2018-01-04']
1 2 ['2017-11-16', '2018-11-08', '2018-11-24', '2018-11-30', '2018-12-03', '2018-12-04', '2019-03-06', '2019-03-10', '2019-03-22']
Note that in this case the code df['data'] = df['data'].apply(...)
will NOT work because the sort function has a default inplace=True parameter: it directly modifies the dataframe and doesn't create a new one.
To apply other functions you might have to use the df = df.apply(etc)
formulation.