Home > database >  How to typecast string to date in pyspark?
How to typecast string to date in pyspark?

Time:09-09

I want to typecast the string to date format.

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
data_source_df=spark.createDataFrame(
        data = [ ("1","2019.06.24 12:01:19")],
        schema=["id","input"])
#df.printSchema()
data_source_df.show()
from pyspark.sql.types import DateType

data_source_df = data_source_df.withColumn("input", to_date("input", "MM/dd/yyyy"))

I tried the above code, it is typecasting to date but I'm getting null as resulting output.

 --- ----- 
| id|input|
 --- ----- 
|  1| null|
 --- ----- 

Any help would be appreciated!!!

CodePudding user response:

This works for me

data_source_df = data_source_df.withColumn("input", to_date("input", "yyyy.MM.dd"))

CodePudding user response:

Your direction, which use to_date(), is correct, but just change the format to yyyy.MM.dd HH:MM:SS, that is

data_source_df = data_source_df.withColumn("input", to_date("input", "yyyy.MM.dd HH:MM:SS"))

You null value is because of the incorrect format.

  • Related