Home > Enterprise >  Creating Timestamp from a date column and time column in a dataframe created from read_excel
Creating Timestamp from a date column and time column in a dataframe created from read_excel

Time:09-15

df = pd.read_excel(filepath)
df['Timestamp'] = pd.to_datetime(df['Date']   ' '   df['Time'])

The above code results in an error. The 'Date' column is being automatically identified as datetime64[ns] data type, while the 'Time' column is only recognized as a string. I know the method above only works if both the 'Date' and 'Time' columns are strings. Is there a better way to combine these two columns into a timestamp?

Edit to add info about sample data:

The sample data has the 'Date' column shown in Excel as the YYYY-MM-DD format, and the 'Time' column shown in Excel as the HH:MM format.

df = pd.read_excel(filepath)
df.info()

'Date' has the data type datetime64[ns] 'Time' has the data type object

When I try to convert the 'Date' column data type to object, the format of the 'Date' column changes in the dataframe.

df['Date'] = df[' Date'].astype(str)

Data type for 'Date' does change to object successfully. However when I check the dataframe, the column 'Date' is now displayed as YYYY-MM-DD HH:MM:SS.

This format for the 'Date' column will not work to combine the 'Date' and 'Time' using pd.to_datetime when concatenating the 'Date' and 'Time' strings

CodePudding user response:

I hope this code helps

import numpy as np
import pandas as pd
from io import StringIO

a = StringIO('''col1,col2
2022-09-22,11:03:00
2022-01-21,12:03:00
''')
df = pd.read_csv(a,sep=',', parse_dates={'datetime': [0,1]})
print(df.info())
print(df)

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 1 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   datetime  2 non-null      datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 144.0 bytes
None
             datetime
0 2022-09-22 11:03:00
1 2022-01-21 12:03:00

CodePudding user response:

I don't know the exact format you have for either columns, but try this:

df = pd.read_excel(filepath)

df["Timestamp"] = pd.to_datetime(df["Date"]   df["Time"], format="%d%b%Y%H:%M:%S")

df["Timestamp"] = df["Timestamp"].dt.strftime("%m/%d/%Y %H:%M:%S")

Hope it helps.

  • Related