Home > other >  Concatenating many time and date columns
Concatenating many time and date columns

Time:11-14

I have many date and time column pairs (around 15 each) that share the same prefix, ie. SH or DEL. The columns are all of dtype object, ie. string. All the columns belong to the same Dataframe. Here is an example of two such pairs:

  SH_DATE         DEL_DATE          SH_TIME      DEL_TIME                

 2020-04-22      2020-04-27         19:42:00     19:11:00           
 2020-04-22      2020-04-25         19:42:00     19:26:00           
 2020-04-24      2020-04-24         09:55:00     09:55:00           
 2020-04-24      2020-04-26         14:27:00     14:27:00           

I'd like to combine each date/time column pair into a single column. My first inclination was to utilize LIKE statements for time and date columns in the dataframe, by extracting date/time in their own distinct lists, and then looping over them for the pd.concat function, but haven't found a solution yet.

Example of desired output:

     SH_DATETIME            DEL_DATETIME                        

 2020-04-22 19:42:00     2020-04-27 19:11:00                
 2020-04-22 19:42:00     2020-04-25 19:26:00                    
 2020-04-24 09:55:00     2020-04-24 09:55:00                    
 2020-04-24 14:27:00     2020-04-26 14:27:00          

CodePudding user response:

Determine the unique column name prefixes (e.g. 'SH' or 'DEL') and combine the respective date / time columns with a space as separator. Parse the combined to datetime.

pfxs = df.columns.to_series().str.replace('_TIME', '').str.replace('_DATE', '').unique()

for p in pfxs:
    if f'{p}_DATE' in df.columns and f'{p}_TIME' in df.columns:
        df[f'{p}_DATETIME'] = pd.to_datetime(df[f'{p}_DATE'] ' ' df[f'{p}_TIME'])
df
      SH_DATE    DEL_DATE  ...         SH_DATETIME        DEL_DATETIME
0  2020-04-22  2020-04-27  ... 2020-04-22 19:42:00 2020-04-27 19:11:00
1  2020-04-22  2020-04-25  ... 2020-04-22 19:42:00 2020-04-25 19:26:00
2  2020-04-24  2020-04-24  ... 2020-04-24 09:55:00 2020-04-24 09:55:00
3  2020-04-24  2020-04-26  ... 2020-04-24 14:27:00 2020-04-26 14:27:00

To make this somewhat fail-safe, you can add checks to make sure a date and a time column exists for given prefix before attempting to combine.

In case some elements in your date & time columns cannot be parsed to datetime, setting errors='coerce' might be an option, leaving you with NaT for elements that fail to parse, e.g.

pd.to_datetime(df[f'{p}_DATE'] ' ' df[f'{p}_TIME'], errors='coerce')

in the code above.

  • Related