I have this method where I am gathering positive values
def pos_values(df, metrics):
num_pos_values = df.where(df.ttu > 1).count()
df.withColumn("loader_ttu_pos_value", num_pos_values)
df.write.json(metrics)
However I get TypeError: col should be Column
whenever I go to test it. I tried to cast it but that doesn't seem to be an option.
CodePudding user response:
The reason you're getting this error is because df.withColumn
expects a Column
object as second argument, whereas you're giving num_pos_values
which is an integer.
If you want to assign a literal value to a column (you'll have the same value for each row), you can use the lit
function of pyspark.sql.functions
.
Something like this works:
df = spark.createDataFrame([("2022", "January"), ("2021", "December")], ["Year", "Month"])
df.show()
---- --------
|Year| Month|
---- --------
|2022| January|
|2021|December|
---- --------
from pyspark.sql.functions import lit
df.withColumn("testColumn", lit(5)).show()
---- -------- ----------
|Year| Month|testColumn|
---- -------- ----------
|2022| January| 5|
|2021|December| 5|
---- -------- ----------