I have a column in the format yyyy-MM-dd HH:mm:ss
.
I want to convert it to time in the following format: HH:mm:ss
I was trying the following function but it didn't work out:
date_format(date, 'HH:mm:ss')
How can I achieve this in spark SQL?
my code :
date_format(schedule_date, 'HH:mm:ss')
input: 13-12-2021 4:05:00 AM
expected output: 4:05:00 AM
CodePudding user response:
Your input column schedule_date
seems to be of type string, you need to convert it to TimestampType
before using date_format
function:
import pyspark.sql.functions as F
df = spark.createDataFrame([('13-12-2021 4:05:00 AM',)], ['schedule_date'])
df.withColumn(
'schedule_time',
F.date_format(F.to_timestamp('schedule_date', 'dd-MM-yyy H:mm:ss a'), 'H:mm:ss a')
).show(truncate=False)
# --------------------- -------------
#|schedule_date |schedule_time|
# --------------------- -------------
#|13-12-2021 4:05:00 AM|4:05:00 AM |
# --------------------- -------------
Spark SQL equivalent query:
select date_format(to_timestamp(schedule_date, 'dd-MM-yyy H:mm:ss a'), 'H:mm:ss a') as schedule_time
from df
CodePudding user response:
You could use below.
# Option 1
import pyspark.sql.functions as F
df = spark.createDataFrame([('13-12-2021 4:05:00 PM',)], ['schedule_date'])
df.withColumn(
'schedule_time',
F.date_format(F.to_timestamp('schedule_date', 'dd-MM-yyy hh:mm:ss aa'), 'hh:mm:ss aa')
).show(truncate=False)
# Option 2
import pyspark.sql.functions as F
df = spark.createDataFrame([('13-12-2021 4:05:00 AM',)], ['schedule_date'])
df.withColumn(
'schedule_time',
F.from_unixtime(F.unix_timestamp('schedule_date', 'dd-MM-yyy hh:mm:ss aa'), 'hh:mm:ss aa')
).show(truncate=False)