Hi I am not an expert in python and I am still a beginner in using pandas and working with data. I have a df with a column timestamp. The datetime in the column are as shown below:
2021-09-07 16:36:14 UTC
2021-09-04 15:31:44 UTC
2021-07-15 06:49:47.320081 UTC
2021-09-07 14:55:55.353145 UTC
I would like to have only the date and time, without the UTC text at the end and without the decimals after the second and in the end save the dataframe in a csv file. Basically I want the column in this format:
2021-09-07 16:36:14
2021-09-04 15:31:44
2021-07-15 06:49:47
2021-09-07 14:55:55
I tried with these two functions:
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d %H:%M:%S %Z', errors='coerce')
df['timestamp'] = df['timestamp'].dt.strftime('%Y-%m-%d %H:%M:%S')
I fix half of the problem. The datetime without the decimals after the second get fixed, but the ones with the decimals just get empty, you can find the example below:
2021-09-07 16:36:14
2021-09-04 15:31:44
Please can someone help me with this problem?
CodePudding user response:
Try extracting the part of the field you want.
df['timestamp'] = pd.to_datetime(df['timestamp'].str[:19])
print(df)
print(df.dtypes.
timestamp
0 2021-09-07 16:36:14
1 2021-09-04 15:31:44
2 2021-07-15 06:49:47
3 2021-09-07 14:55:55
timestamp datetime64[ns]
dtype: object
CodePudding user response:
You can take the first 20 characters:
df['timestamp'] = pd.to_datetime(df['timestamp'].str[:19])
print(df)
# Output
timestamp
0 2021-09-07 16:36:14
1 2021-09-04 15:31:44
2 2021-07-15 06:49:47
3 2021-09-07 14:55:55
If you want to keep the timezone information (UTC), you can remove only the microsecond part:
df['timestamp']= pd.to_datetime(df['timestamp'].str.replace('\.\d ', '', regex=True))
print(df)
# Output
timestamp
0 2021-09-07 16:36:14 00:00
1 2021-09-04 15:31:44 00:00
2 2021-07-15 06:49:47 00:00
3 2021-09-07 14:55:55 00:00
CodePudding user response:
Try parser
, as it can take different formats as an input
from dateutil import parser
# df['timestamp'] = parser.parse(df['timestamp'])
date = parser.parse("2021-07-15 06:49:47.320081 UTC")
print(date)
2021-07-15 06:49:47.320081 00:00
Or this output
# Which would imply
# df['timestamp'] = parser.parse(df['timestamp']).strftime("%F %T")
print(date.strftime("%F %T"))
2021-07-15 06:49:47