How to apply .sort() with a key=lambda function to every row of a dataframe on a single column?-CodePudding

I have a dataframe with a column containing a list of dates:

data = [
    [
        1,
        [
            "2017-12-06",
            "2017-12-05",
            "2017-12-06",
            "2018-01-03",
            "2018-01-04",
            "2017-11-24",
        ],
    ],
    [
        2,
        [
            "2019-03-10",
            "2018-12-03",
            "2018-12-04",
            "2018-11-08",
            "2018-11-30",
            "2019-03-22",
            "2018-11-24",
            "2019-03-06",
            "2017-11-16",
        ],
    ],
]
df = pd.DataFrame(data, columns=["id", "dates"])
df

id  dates
1   [2017-12-06, 2017-12-05, 2017-12-06, 2018-01-03, 2018-01-04, 2017-11-24]
2   [2019-03-10, 2018-12-03, 2018-12-04, 2018-11-08, 2018-11-30, 2019-03-22, 2018-11-24, 2019-03-06, 2017-11-16]

print(df.dtypes)
id        int64
dates    object
dtype: object

I would like to sort the date containing column (dates). I have tried a number of methods with no success (including .apply(list.sort) in place). The only method that I've found that works is using .sort(key = ....) like below:

import datetime
from datetime import datetime

dates = [
    "2019-03-10",
    "2018-12-03",
    "2018-12-04",
    "2018-11-08",
    "2018-11-30",
    "2019-03-22",
    "2018-11-24",
    "2019-03-06",
    "2017-11-16",
]

dates.sort(key=lambda date: datetime.strptime(date, "%Y-%m-%d"))

but I can only get it to work on a list and I want to apply this to that entire column in the dataframe df. Can anyone advise the best way to do this? Or perhaps there is an even better way to sort this column?

CodePudding user response：

What I see here is that you want the list in every row to be sorted (not the column itself).

The code below applies a certain function (something like my_sort()) to each row of column "dates":

df['dates'].apply(my_sort)

You just need to implement my_sort to be applied to the list in each row. Something like:

def my_sort(dates):
    dates.sort(key=lambda date: datetime.strptime(date, "%Y-%m-%d"))
    return dates

CodePudding user response：

You can use .apply() to apply a given function (in this case 'sort') to every row of a dataframe column.

This should work:

df['dates'].apply(lambda row: row.sort(key=lambda date: datetime.strptime(date, "%Y-%m-%d")))

print(df)

returns:

   id                                              dates
0   1  ['2017-11-24', '2017-12-05', '2017-12-06', '2017-12-06', '2018-01-03', '2018-01-04']
1   2  ['2017-11-16', '2018-11-08', '2018-11-24', '2018-11-30', '2018-12-03', '2018-12-04', '2019-03-06', '2019-03-10', '2019-03-22']

Note that in this case the code df['data'] = df['data'].apply(...) will NOT work because the sort function has a default inplace=True parameter: it directly modifies the dataframe and doesn't create a new one. To apply other functions you might have to use the df = df.apply(etc)formulation.