I have many date and time column pairs (around 15 each) that share the same prefix, ie. SH or DEL. The columns are all of dtype object, ie. string. All the columns belong to the same Dataframe
. Here is an example of two such pairs:
SH_DATE DEL_DATE SH_TIME DEL_TIME
2020-04-22 2020-04-27 19:42:00 19:11:00
2020-04-22 2020-04-25 19:42:00 19:26:00
2020-04-24 2020-04-24 09:55:00 09:55:00
2020-04-24 2020-04-26 14:27:00 14:27:00
I'd like to combine each date/time column pair into a single column. My first inclination was to utilize LIKE
statements for time and date columns in the dataframe
, by extracting date/time in their own distinct lists, and then looping over them for the pd.concat
function, but haven't found a solution yet.
Example of desired output:
SH_DATETIME DEL_DATETIME
2020-04-22 19:42:00 2020-04-27 19:11:00
2020-04-22 19:42:00 2020-04-25 19:26:00
2020-04-24 09:55:00 2020-04-24 09:55:00
2020-04-24 14:27:00 2020-04-26 14:27:00
CodePudding user response:
Determine the unique column name prefixes (e.g. 'SH' or 'DEL') and combine the respective date / time columns with a space as separator. Parse the combined to datetime.
pfxs = df.columns.to_series().str.replace('_TIME', '').str.replace('_DATE', '').unique()
for p in pfxs:
if f'{p}_DATE' in df.columns and f'{p}_TIME' in df.columns:
df[f'{p}_DATETIME'] = pd.to_datetime(df[f'{p}_DATE'] ' ' df[f'{p}_TIME'])
df
SH_DATE DEL_DATE ... SH_DATETIME DEL_DATETIME
0 2020-04-22 2020-04-27 ... 2020-04-22 19:42:00 2020-04-27 19:11:00
1 2020-04-22 2020-04-25 ... 2020-04-22 19:42:00 2020-04-25 19:26:00
2 2020-04-24 2020-04-24 ... 2020-04-24 09:55:00 2020-04-24 09:55:00
3 2020-04-24 2020-04-26 ... 2020-04-24 14:27:00 2020-04-26 14:27:00
To make this somewhat fail-safe, you can add checks to make sure a date and a time column exists for given prefix before attempting to combine.
In case some elements in your date & time columns cannot be parsed to datetime, setting errors='coerce'
might be an option, leaving you with NaT
for elements that fail to parse, e.g.
pd.to_datetime(df[f'{p}_DATE'] ' ' df[f'{p}_TIME'], errors='coerce')
in the code above.