Home > Enterprise >  How to convert a column of datetime with different format to a specific one?
How to convert a column of datetime with different format to a specific one?

Time:03-10

Hi I am not an expert in python and I am still a beginner in using pandas and working with data. I have a df with a column timestamp. The datetime in the column are as shown below:

2021-09-07 16:36:14 UTC 
2021-09-04 15:31:44 UTC
2021-07-15 06:49:47.320081 UTC
2021-09-07 14:55:55.353145 UTC

I would like to have only the date and time, without the UTC text at the end and without the decimals after the second and in the end save the dataframe in a csv file. Basically I want the column in this format:

2021-09-07 16:36:14 
2021-09-04 15:31:44
2021-07-15 06:49:47
2021-09-07 14:55:55

I tried with these two functions:

df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d %H:%M:%S %Z', errors='coerce')
df['timestamp'] = df['timestamp'].dt.strftime('%Y-%m-%d %H:%M:%S')

I fix half of the problem. The datetime without the decimals after the second get fixed, but the ones with the decimals just get empty, you can find the example below:

2021-09-07 16:36:14 
2021-09-04 15:31:44

Please can someone help me with this problem?

CodePudding user response:

Try extracting the part of the field you want.

df['timestamp'] = pd.to_datetime(df['timestamp'].str[:19])
print(df)
print(df.dtypes.

            timestamp
0 2021-09-07 16:36:14
1 2021-09-04 15:31:44
2 2021-07-15 06:49:47
3 2021-09-07 14:55:55


timestamp    datetime64[ns]
dtype: object

CodePudding user response:

You can take the first 20 characters:

df['timestamp'] = pd.to_datetime(df['timestamp'].str[:19])
print(df)

# Output
            timestamp
0 2021-09-07 16:36:14
1 2021-09-04 15:31:44
2 2021-07-15 06:49:47
3 2021-09-07 14:55:55

If you want to keep the timezone information (UTC), you can remove only the microsecond part:

df['timestamp']= pd.to_datetime(df['timestamp'].str.replace('\.\d ', '', regex=True))
print(df)

# Output
                  timestamp
0 2021-09-07 16:36:14 00:00
1 2021-09-04 15:31:44 00:00
2 2021-07-15 06:49:47 00:00
3 2021-09-07 14:55:55 00:00

CodePudding user response:

Try parser, as it can take different formats as an input

from dateutil import parser

# df['timestamp'] = parser.parse(df['timestamp'])
date = parser.parse("2021-07-15 06:49:47.320081 UTC")

print(date)
2021-07-15 06:49:47.320081 00:00

Or this output

# Which would imply
# df['timestamp'] = parser.parse(df['timestamp']).strftime("%F %T")
print(date.strftime("%F %T"))
2021-07-15 06:49:47
  • Related