I have a dataframe that has two date containing columns I'd like to perform the following operations on:
- Concatenate into a NEW column.
- Get the unique values (no redundant dates).
data = [
[
"2018-12-08",
"2018-12-09",
],
["2020-12-19", "2020-12-20"],
["2020-12-19", "2020-12-19"],
["2020-10-06", "2020-10-12"],
]
df = pd.DataFrame(data, columns=["date1", "date2"])
df[["date1", "date2"]].apply(lambda x: pd.to_datetime(x).dt.date)
The challenge I am facing is finding a clear concise way to do this.
To create one column I am doing this:
df['date_range'] = df[['date1', 'date2']].astype(str).values.tolist()
but I am not sure how to get the unique values from each row?
My desired dataframe looks like this:
date1 date2 date_range
2018-12-08 2018-12-09 2018-12-08,2018-12-09
2018-12-19 2018-12-20 2018-12-19,2018-12-20
2018-12-19 2018-12-19 2018-12-19
2018-10-06 2018-10-12 2018-10-06,2018-10-12
CodePudding user response:
You can use set()
to find the unique elements in each row, and a list comprehension to generate your desired result, joining each unique list together with a comma. Something like
df['date_range'] = [','.join(list(set(dates))) for dates in df[['date1', 'date2']].astype(str).values]
Also, you can drop the .tolist()
!
CodePudding user response:
This should work. Usings.unique()
preserves the order of the series, so if the order of the start and end dates is important, than this would be a solution.
df['date_range'] = df[['date1', 'date2']].apply(lambda x: x.unique(),axis=1).str.join(', ')