Home > Net >  Create dataframe with timestamp field
Create dataframe with timestamp field

Time:07-12

On Databricks, the following code snippet

%python

from pyspark.sql.types import StructType, StructField, TimestampType
from pyspark.sql import functions as F

data = [F.current_timestamp()]
schema = StructType([StructField("current_timestamp", TimestampType(), True)])
df = spark.createDataFrame(data, schema)
display(df)

displays a table with value "null". I would expect to see the current timestamp there. Why is this not the case?

CodePudding user response:

createDataFrame does not accept PySpark expressions.

  • You could pass python's datetime.datetime.now():

    import datetime
    
    df = spark.createDataFrame([(datetime.datetime.now(),)], ['ts'])
    

    Defining schema beforehand:

    from pyspark.sql.types import *
    import datetime
    
    data = [(datetime.datetime.now(),)]
    schema = StructType([StructField("current_timestamp", TimestampType(), True)])
    df = spark.createDataFrame(data, schema)
    
  • OR add timestamp column afterwards:

    from pyspark.sql import functions as F
    
    df = spark.range(3)
    
    df1 = df.select(
        F.current_timestamp().alias('ts')
    )
    
    df2 = df.withColumn('ts', F.current_timestamp())
    
  • Related