I have data from an S3 bucket and want to convert the Date column from string to date. The current Date column is in the format 7/1/2022 12:0:15 AM
.
Current code I am using in AWS Glue Studio to attempt the custom transformation:
MyTransform (glueContext, dfc) -> DynamicFrameCollection:
from pyspark.sql.functions import col, to_timestamp
df = dfc.select(list(dfc.keys())[0]).toDF()
df = df.withColumn('Date',to_timestamp(col("Date"), 'MM/dd/yyyy HH:MM:SS'))
df_res = DynamicFrame.fromDF(df, glueContext, "df")
return(DynamicFrameCollection({"CustomTransform0": df_res}, glueContext))
With MM/dd/yyyy HH:MM:SS
date formatting, it runs but returns null for the Date column. When I try any other date format besides this, it errors out. I suspect the date formatting may be the issue, but I am not certain.
CodePudding user response:
After converting string to timestamp you need to cast it to date type, like this:
df = df.withColumn(df_col, df[df_col].cast("date"))
CodePudding user response:
We ended up removing the HH:MM:SS
portion of the date format and this worked for our needs. I would still be interested if anyone can figure out how to get the hours, minutes, seconds, and AM/PM to work, but we can do without for now.