Spark concatenating strings using withColumn()-CodePudding

So I have the given dataframe:

 -------------------- ------------- 
|           entity_id|        state|
 -------------------- ------------- 
|             ha_tdeg|         39.9|
|         memory_free|       1459.4|
|            srv_tdeg|         39.0|
|       as_tempera...|          9.5|
|         as_humidity|        81.71|
|         as_pressure|      1003.35|
|      as_am_humidity|        22.16|
|      as_pm_humidity|         4.64|
|         memory_free|       1460.0|
|             ha_tdeg|         38.0|
|         memory_free|       1459.3|
 -------------------- -------------

Im trying to add a percentage sign to every "state" where "entity_id" contains 'humidity'. So as it's seen in the code below, I set the "state" column to "String" before I work with it. But whenever I execute the command below and try to concatenate '%' (or any other string), all the values become "null". Interesting for me is that if I try to concatenate a number wraped as String ("10"), It performs a math's addition.

What is the way to overcome this issue?

Here is the code im using:

var humidityDF = df.filter(forExport("entity_id").contains("humidity") && df("state").isNotNull)
humidityDF = humidityDF.withColumn("state", humidityDF("state").cast("String"))
humidityDF = humidityDF.withColumn("state", col("state")   "%")

I tried :

humidityDF = humidityDF.withColumn("state", col("state").toString "%")

But this doesn't work since 'withColumn' accepts only Column type parameters.

CodePudding user response：

import org.apache.spark.sql.functions.{lit, concat}
humidityDF = humidityDF.withColumn("state", concat(col("state"),lit("%")))