Home > Software engineering >  Pyspark - Create Timestamp from Date and Hour Columns
Pyspark - Create Timestamp from Date and Hour Columns

Time:10-23

I have a Date and an Hour column in a PySpark dataframe. How do I merge these together to get the Desired_Calculated_Result column?

df1 = sqlContext.createDataFrame(
  [
     ('2021-10-20','1300', '2021-10-20 13:00:00.000 0000')
    ,('2021-10-20','1400', '2021-10-20 14:00:00.000 0000')
    ,('2021-10-20','1500', '2021-10-20 15:00:00.000 0000')
  ]
  ,['Date', 'Hour', 'Desired_Calculated_Result']
)

I also tried:

df1.withColumn("TimeStamp", unix_timestamp(concat_ws(" ", df1.Date, df1.Hour), "yyyy-MM-dd HHmm").cast("timestamp")).show(). 

This returned all nulls in the Timestamp Column

CodePudding user response:

from pyspark.sql.functions import concat, unix_timestamp

df1\
  .withColumn("TimeStamp", unix_timestamp(concat(df1.Date, df1.Hour), "yyyy-MM-ddHHmm")\
  .cast("timestamp"))\
  .show()
  • Related