Home > Software design >  Merging columns with Python and Pandas
Merging columns with Python and Pandas

Time:04-07

I'm using Python 3.0 and Pandas to clean some data.

I've the following table :

#   Item_ID          Date_1                    Date_2
0    1857      2020-11-05 00:00:00      2020-12-05 00:00:00
1    1569      2020-12-09 00:00:00      2021-01-07 00:00:00
2    2569      2020-12-09 00:00:00      NaN
3    6587      2020-12-09 00:00:00      2021-10-08 00:00:00
4    5236      2020-12-09 00:00:00      -

The code to create the dataframe in order to make it easy:

d = {'Item_ID': [1857, 1569, 2569, 6587, 5236], 'Date_1': ['2020-11-05 00:00:00', '2020-12-09 00:00:00', '2020-12-09 00:00:00', '2020-12-09 00:00:00', '2020-12-09 00:00:00'], 'Date_2': ['2020-12-05 00:00:00', '2021-01-07 00:00:00', 'NaN', '2021-10-08 00:00:00', '-']}
df = pd.DataFrame(data=d)

I would like to merge the columns 'Date 1' and 'Date 2' using an efficient method because I can have a really big dataframe. The result would be the following one:

#   Item_ID          Date 3        
0    1857      2020-12-05 00:00:00
1    1569      2021-01-07 00:00:00
2    2569      2020-12-09 00:00:00
3    6587      2021-10-08 00:00:00
4    5236      2020-12-09 00:00:00

The date on 'Date 1' will only be replaced by the content of 'Date 2' if the content of 'Date 2' is a date (no matter if is > or < than the date of 'Date 1').

Can this be done with a merge ?

CodePudding user response:

You can use a mask with help of pandas.to_datetime to ensure that you have dates:

mask = pd.to_datetime(df['Date_2'], errors='coerce').isna()
df['Date_3'] = df['Date_1'].where(mask, df['Date_2'])

output:

   Item_ID               Date_1               Date_2               Date_3
0     1857  2020-11-05 00:00:00  2020-12-05 00:00:00  2020-12-05 00:00:00
1     1569  2020-12-09 00:00:00  2021-01-07 00:00:00  2021-01-07 00:00:00
2     2569  2020-12-09 00:00:00                  NaN  2020-12-09 00:00:00
3     6587  2020-12-09 00:00:00  2021-10-08 00:00:00  2021-10-08 00:00:00
4     5236  2020-12-09 00:00:00                    -  2020-12-09 00:00:00

Or, if you want to have datetime type:

df['Date_3'] = pd.to_datetime(df['Date_2'], errors='coerce').fillna(df['Date_1'])

output:

   Item_ID               Date_1               Date_2     Date_3
0     1857  2020-11-05 00:00:00  2020-12-05 00:00:00 2020-12-05
1     1569  2020-12-09 00:00:00  2021-01-07 00:00:00 2021-01-07
2     2569  2020-12-09 00:00:00                  NaN 2020-12-09
3     6587  2020-12-09 00:00:00  2021-10-08 00:00:00 2021-10-08
4     5236  2020-12-09 00:00:00                    - 2020-12-09

Alternative output:

keep = ['Item_ID']
df[keep].join(pd.to_datetime(df['Date_2'], errors='coerce')
                .fillna(df['Date_1'])
                .rename('Date_3')
              )

   Item_ID     Date_3
0     1857 2020-12-05
1     1569 2021-01-07
2     2569 2020-12-09
3     6587 2021-10-08
4     5236 2020-12-09

CodePudding user response:

You Can do something like below

from datetime import datetime
import pandas as pd

def datetimeChecker(date):
    try:
        datetime.strptime(date,"%Y-%m-%d %H:%M:%S")
        return True
    except:
        return False

d = {'Item_ID': [1857, 1569, 2569, 6587, 5236], 'Date_1': ['2020-11-05 00:00:00', '2020-12-09 00:00:00', '2020-12-09 00:00:00', '2020-12-09 00:00:00', '2020-12-09 00:00:00'], 'Date_2': ['2020-12-05 00:00:00', '2021-01-07 00:00:00', 'NaN', '2021-10-08 00:00:00', '-']}
df = pd.DataFrame(data=d)

df["final_date"]=df.apply(lambda x: x['Date_2'] if datetimeChecker(x['Date_2']) else x['Date_1'],axis=1)

And the output look like below:

 Item_ID               Date_1               Date_2           final_date
0     1857  2020-11-05 00:00:00  2020-12-05 00:00:00  2020-12-05 00:00:00
1     1569  2020-12-09 00:00:00  2021-01-07 00:00:00  2021-01-07 00:00:00
2     2569  2020-12-09 00:00:00                  NaN  2020-12-09 00:00:00
3     6587  2020-12-09 00:00:00  2021-10-08 00:00:00  2021-10-08 00:00:00
4     5236  2020-12-09 00:00:00                    -  2020-12-09 00:00:00
  • Related