I want to typecast the string to date format.
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
data_source_df=spark.createDataFrame(
data = [ ("1","2019.06.24 12:01:19")],
schema=["id","input"])
#df.printSchema()
data_source_df.show()
from pyspark.sql.types import DateType
data_source_df = data_source_df.withColumn("input", to_date("input", "MM/dd/yyyy"))
I tried the above code, it is typecasting to date but I'm getting null as resulting output.
--- -----
| id|input|
--- -----
| 1| null|
--- -----
Any help would be appreciated!!!
CodePudding user response:
This works for me
data_source_df = data_source_df.withColumn("input", to_date("input", "yyyy.MM.dd"))
CodePudding user response:
Your direction, which use to_date()
, is correct, but just change the format to yyyy.MM.dd HH:MM:SS
, that is
data_source_df = data_source_df.withColumn("input", to_date("input", "yyyy.MM.dd HH:MM:SS"))
You null value is because of the incorrect format.