How to combine date-containing columns into new column with unique dates?-CodePudding

I have a dataframe that has two date containing columns I'd like to perform the following operations on:

Concatenate into a NEW column.
Get the unique values (no redundant dates).

data = [
    [
        "2018-12-08",
        "2018-12-09",
    ],
    ["2020-12-19", "2020-12-20"],
    ["2020-12-19", "2020-12-19"],
    ["2020-10-06", "2020-10-12"],
]

df = pd.DataFrame(data, columns=["date1", "date2"])
df[["date1", "date2"]].apply(lambda x: pd.to_datetime(x).dt.date)

The challenge I am facing is finding a clear concise way to do this.

To create one column I am doing this:

df['date_range'] = df[['date1', 'date2']].astype(str).values.tolist()

but I am not sure how to get the unique values from each row?

My desired dataframe looks like this:

date1       date2            date_range
2018-12-08  2018-12-09  2018-12-08,2018-12-09
2018-12-19  2018-12-20  2018-12-19,2018-12-20
2018-12-19  2018-12-19  2018-12-19
2018-10-06  2018-10-12  2018-10-06,2018-10-12

CodePudding user response：

You can use set() to find the unique elements in each row, and a list comprehension to generate your desired result, joining each unique list together with a comma. Something like

df['date_range'] = [','.join(list(set(dates))) for dates in df[['date1', 'date2']].astype(str).values]

Also, you can drop the .tolist()!

CodePudding user response：

This should work. Usings.unique() preserves the order of the series, so if the order of the start and end dates is important, than this would be a solution.

df['date_range'] = df[['date1', 'date2']].apply(lambda x: x.unique(),axis=1).str.join(', ')