I have a column called createdtime having few nulls. All I want it to fill those nulls with a current timestamp.
I have tried below piece of code where I assign the time manually, I want it to in such a way that whenever I run this piece of code it should pick up the current_timestamp()
from pyspark.sql.functions import *
default_time = '2022-06-28 05:07:29.077'
df = df.fillna({'createdtime': default_time})
I have tried below method but gives an error: TypeError: Column is not iterable.
from pyspark.sql.functions import *
default_time = current_timestamp()
df = df.fillna({'createdtime': default_time})
CodePudding user response:
The default_time
variable needs to be quoted in quotes.
default_time = '2022-06-28 05:07:29.077'
df = df.fillna({'createdtime': f'{default_time}'})
Or use the coalesce
function.
df = df.withColumn('createdtime', F.coalesce('createdtime', F.current_timestamp()))
CodePudding user response:
Because fillna accepts a string and not column you can use below code
import datetime
df.fillna({"dt_service":str(datetime.datetime.utcnow())})