I want to add the value of a column (add_value
) which is based on seconds, to a timestamp column (start_date
).
----------------------- ----------
| start_date | add_value|
----------------------- ----------
|2022-09-05 00:00:01.394| 6.001 |
----------------------- ----------
Using the below code for direct value works fine.
(col('start_date') expr('INTERVAL 6.001 seconds')).alias("end_date")
But replacing it with the column's name has an error.
(col('start_date') expr('INTERVAL add_value seconds')).alias("end_date")
error:
Syntax error at or near 'seconds': extra input 'seconds'(line 1, pos 22)
== SQL == INTERVAL add_value seconds ----------------------^^^
How should I use the column's name?
CodePudding user response:
To create intervals using existing columns you can concatenate 'interval' add_value 'seconds'
and cast the resulting string into INTERVAL
type. Then you can add it to timestamp start_date
:
from pyspark.sql import functions as F
df.withColumn(
'end_date',
F.col('start_date') F.format_string('interval %s seconds', 'add_value').cast('interval')
).show()
# ----------------------- --------- -----------------------
# |start_date |add_value|end_date |
# ----------------------- --------- -----------------------
# |2022-09-05 00:00:01.394|6.001 |2022-09-05 00:00:07.395|
# ----------------------- --------- -----------------------
CodePudding user response:
INTERVAL
doesn't let you use values from a column. But an easy approach can be that you cast the timestamp as double
and then add the decimal seconds. Convert the resulting double
value to a timestamp to get the desired format.
spark.sparkContext.parallelize([('2022-09-05 00:00:01.394', 6.001)]).toDF(['start_date', 'add_value']). \
withColumn('start_date', func.to_timestamp('start_date')). \
withColumn('start_date_added_value',
(func.col('start_date').cast('double') func.col('add_value')).cast('timestamp')
). \
show(truncate=False)
# ----------------------- --------- -----------------------
# |start_date |add_value|start_date_added_value |
# ----------------------- --------- -----------------------
# |2022-09-05 00:00:01.394|6.001 |2022-09-05 00:00:07.395|
# ----------------------- --------- -----------------------