I'm using Databricks in order to join some tables that are stored as parquet files in ADLS. I'm importing the files, saving the dataframes as TEMP VIEWs and then build up the syntax for JOIN inside spark.sql('').
In the Join I have to replicate some SQL code previously developed by my colleagues that used T-SQL ISNULL function in this way:
ISNULL(titledict.Category_Level_1, urldict.Category_Level_1)
Basically they given the replacement value, thing which I cannot do with the ISNULL function from SparkSQL.
"titledict" and "urldict" are the aliases of 2 of the tables from the entire SQL JOIN logic.
Which would be the optimal way to replicate the ISNULL function from T-SQL in this situation?
CodePudding user response:
There is the coalesce function which does the same. It returns the second value if the first one is null.
df.withColumn('test', coalesce(col('test_value'), lit('Fallback value')))