On Databricks, the following code snippet
%python
from pyspark.sql.types import StructType, StructField, TimestampType
from pyspark.sql import functions as F
data = [F.current_timestamp()]
schema = StructType([StructField("current_timestamp", TimestampType(), True)])
df = spark.createDataFrame(data, schema)
display(df)
displays a table with value "null". I would expect to see the current timestamp there. Why is this not the case?
CodePudding user response:
createDataFrame
does not accept PySpark expressions.
You could pass python's
datetime.datetime.now()
:import datetime df = spark.createDataFrame([(datetime.datetime.now(),)], ['ts'])
Defining schema beforehand:
from pyspark.sql.types import * import datetime data = [(datetime.datetime.now(),)] schema = StructType([StructField("current_timestamp", TimestampType(), True)]) df = spark.createDataFrame(data, schema)
OR add timestamp column afterwards:
from pyspark.sql import functions as F df = spark.range(3) df1 = df.select( F.current_timestamp().alias('ts') ) df2 = df.withColumn('ts', F.current_timestamp())