Home > Software design >  Convert StringType to TimeStamp on Pyspark
Convert StringType to TimeStamp on Pyspark

Time:12-30

How can I convert a column with string values in this format "Dec 25 2022 6:31AM" to Timestamp?

No matter what I do, I still get null values in the new column.

I've tried:

import pyspark.sql.functions as fn

df.withColumn('new_ts', fn.col('SendTime').cast("timestamp"))
df.withColumn("new_ts",fn.to_timestamp(fn.col("SendTime")).cast('string'))
df.withColumn('new_ts', (fn.to_timestamp('SendTime', 'yyyy-MM-dd HH:mm:ss.SSS-0300')).cast('date'))

among other attempts.

CodePudding user response:

You were close, to_timestamp is correct function in your case but you need to fix your datetime pattern.

I was able to figure out something like this:

import pyspark.sql.functions as F

data1 = [
    ["Dec 25 2022 6:31AM"],
    ["Nov 11 2022 02:31AM"],
    ["Jun 03 2022 08:31PM"]
]

df = spark.createDataFrame(data1).toDF("time")

tmp = df.withColumn("test", F.to_timestamp(F.col("time"), "MMM dd yyyy h:mma"))
tmp.show(truncate = False)

And the output is:

 ------------------- ------------------- 
|time               |test               |
 ------------------- ------------------- 
|Dec 25 2022 6:31AM |2022-12-25 06:31:00|
|Nov 11 2022 02:31AM|2022-11-11 02:31:00|
|Jun 03 2022 08:31PM|2022-06-03 20:31:00|
 ------------------- ------------------- 

So i think that you may try to use this format: MMM dd yyyy h:mma

CodePudding user response:

The to_timestamp() function in Apache PySpark is popularly used to convert String to the Timestamp(i.e., Timestamp Type). The default format of the Timestamp is "MM-dd-yyyy HH:mm: ss. SSS," and if the input is not in the specified form, it returns Null.

  • Related