Home > Blockchain >  How to combine date-containing columns into new column with unique dates?
How to combine date-containing columns into new column with unique dates?

Time:07-13

I have a dataframe that has two date containing columns I'd like to perform the following operations on:

  1. Concatenate into a NEW column.
  2. Get the unique values (no redundant dates).
data = [
    [
        "2018-12-08",
        "2018-12-09",
    ],
    ["2020-12-19", "2020-12-20"],
    ["2020-12-19", "2020-12-19"],
    ["2020-10-06", "2020-10-12"],
]

df = pd.DataFrame(data, columns=["date1", "date2"])
df[["date1", "date2"]].apply(lambda x: pd.to_datetime(x).dt.date)

The challenge I am facing is finding a clear concise way to do this.

To create one column I am doing this:

df['date_range'] = df[['date1', 'date2']].astype(str).values.tolist()

but I am not sure how to get the unique values from each row?

My desired dataframe looks like this:

date1       date2            date_range
2018-12-08  2018-12-09  2018-12-08,2018-12-09
2018-12-19  2018-12-20  2018-12-19,2018-12-20
2018-12-19  2018-12-19  2018-12-19
2018-10-06  2018-10-12  2018-10-06,2018-10-12

CodePudding user response:

You can use set() to find the unique elements in each row, and a list comprehension to generate your desired result, joining each unique list together with a comma. Something like

df['date_range'] = [','.join(list(set(dates))) for dates in df[['date1', 'date2']].astype(str).values]

Also, you can drop the .tolist()!

CodePudding user response:

This should work. Usings.unique() preserves the order of the series, so if the order of the start and end dates is important, than this would be a solution.

df['date_range'] = df[['date1', 'date2']].apply(lambda x: x.unique(),axis=1).str.join(', ')
  • Related