I have a data frame (test_df) with the following columns and datatypes:
| Column | Data Type |
|---------------------|------------------|
| flowId | object |
| flowName | object |
| executionId | object |
| startedAt | datetime64[ns, tzlocal()] |
| lastUpdatedAt | datetime64[ns, tzlocal()] |
| dataPullStartTime| datetime64[ns, tzlocal()] |
| dataPullEndTime | datetime64[ns, tzlocal()] |
| bytesProcessed | float64 |
| bytesWritten | float64 |
| recordsProcessed | float64 |
Before converting this data frame to a JSON type format, I want to change the datetime columns to string with strftime.
test_df[['startedAt','lastUpdatedAt', 'dataPullStartTime', 'dataPullEndTime']] =
test_df[['startedAt','lastUpdatedAt',
'dataPullStartTime','dataPullEndTime']].apply(datetime.strftime('%Y-%m-%d
%H:%M:%S.%f'), axis=1)
However, I always receive the following error:
TypeError: descriptor 'strftime' for 'datetime.date' objects doesn't apply to a 'str' object
I don't get the problem because the data type of the respective columns is datetime not str. Can someone help here? I already tried out multiple solutions from stack, but I was unable to solve the issue.
Thanks in advance!
CodePudding user response:
you can do it one column by one
for col in ['startedAt','lastUpdatedAt', 'dataPullStartTime','dataPullEndTime'] :
test_df[col] = test_df[col].apply(lambda x: datetime.strftime(x, '%Y-%m-%d %H:%M:%S.%f'))
CodePudding user response:
apply
need a function, not an object as args.
df["dataPullEndTime"].apply(lambda x: datetime.strftime(x, '%Y-%m-%d %H:%M:%S.%f'))
Reproduce
datetime.strftime('%Y-%m-%d %H:%M:%S.%f')
TypeError: descriptor 'strftime' requires a 'datetime.date' object but received a 'str'
CodePudding user response:
You should rather use the vectorial dt.strfime
for each column:
cols = ['startedAt','lastUpdatedAt', 'dataPullStartTime', 'dataPullEndTime']
df[cols] = df[cols].apply(lambda c: c.dt.strftime('%Y-%m-%d %H:%M:%S.%f'))
or, for in place modification:
cols = ['startedAt','lastUpdatedAt', 'dataPullStartTime', 'dataPullEndTime']
df.update(df[cols].apply(lambda c: c.dt.strftime('%Y-%m-%d %H:%M:%S.%f')))
example:
df = pd.DataFrame({'col1': ['2000-01-01'], 'col2': ['2022-01-01']})
df['col1'] = pd.to_datetime(df['col1'])
df['col2'] = pd.to_datetime(df['col2'])
print(df)
col1 col2
0 2000-01-01 2022-01-01
cols = ['col1', 'col2']
df.update(df[cols].apply(lambda c: c.dt.strftime('%Y-%m-%d %H:%M:%S.%f')))
print(df)
col1 col2
0 2000-01-01 00:00:00.000000 2022-01-01 00:00:00.000000