df = pd.read_excel(filepath)
df['Timestamp'] = pd.to_datetime(df['Date'] ' ' df['Time'])
The above code results in an error. The 'Date' column is being automatically identified as datetime64[ns] data type, while the 'Time' column is only recognized as a string. I know the method above only works if both the 'Date' and 'Time' columns are strings. Is there a better way to combine these two columns into a timestamp?
Edit to add info about sample data:
The sample data has the 'Date' column shown in Excel as the YYYY-MM-DD format, and the 'Time' column shown in Excel as the HH:MM format.
df = pd.read_excel(filepath)
df.info()
'Date' has the data type datetime64[ns] 'Time' has the data type object
When I try to convert the 'Date' column data type to object, the format of the 'Date' column changes in the dataframe.
df['Date'] = df[' Date'].astype(str)
Data type for 'Date' does change to object successfully. However when I check the dataframe, the column 'Date' is now displayed as YYYY-MM-DD HH:MM:SS.
This format for the 'Date' column will not work to combine the 'Date' and 'Time' using pd.to_datetime when concatenating the 'Date' and 'Time' strings
CodePudding user response:
I hope this code helps
import numpy as np
import pandas as pd
from io import StringIO
a = StringIO('''col1,col2
2022-09-22,11:03:00
2022-01-21,12:03:00
''')
df = pd.read_csv(a,sep=',', parse_dates={'datetime': [0,1]})
print(df.info())
print(df)
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 datetime 2 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 144.0 bytes
None
datetime
0 2022-09-22 11:03:00
1 2022-01-21 12:03:00
CodePudding user response:
I don't know the exact format you have for either columns, but try this:
df = pd.read_excel(filepath)
df["Timestamp"] = pd.to_datetime(df["Date"] df["Time"], format="%d%b%Y%H:%M:%S")
df["Timestamp"] = df["Timestamp"].dt.strftime("%m/%d/%Y %H:%M:%S")
Hope it helps.