Home > Mobile >  How to fill null values with a current_timestamp() in PySpark DataFrame?
How to fill null values with a current_timestamp() in PySpark DataFrame?

Time:06-28

I have a column called createdtime having few nulls. All I want it to fill those nulls with a current timestamp.

I have tried below piece of code where I assign the time manually, I want it to in such a way that whenever I run this piece of code it should pick up the current_timestamp()

from pyspark.sql.functions import *
default_time = '2022-06-28 05:07:29.077'
df = df.fillna({'createdtime': default_time})

I have tried below method but gives an error: TypeError: Column is not iterable.

from pyspark.sql.functions import *
default_time = current_timestamp()
df = df.fillna({'createdtime': default_time})

error screenshot: enter image description here

CodePudding user response:

The default_time variable needs to be quoted in quotes.

default_time = '2022-06-28 05:07:29.077'
df = df.fillna({'createdtime': f'{default_time}'})

Or use the coalesce function.

df = df.withColumn('createdtime', F.coalesce('createdtime', F.current_timestamp()))

CodePudding user response:

Because fillna accepts a string and not column you can use below code

import datetime
df.fillna({"dt_service":str(datetime.datetime.utcnow())})

  • Related