Home > Enterprise >  TypeError: col should be Column with apache spark
TypeError: col should be Column with apache spark

Time:02-02

I have this method where I am gathering positive values

def pos_values(df, metrics):
    num_pos_values = df.where(df.ttu > 1).count()

    df.withColumn("loader_ttu_pos_value", num_pos_values)

    df.write.json(metrics)

However I get TypeError: col should be Column whenever I go to test it. I tried to cast it but that doesn't seem to be an option.

CodePudding user response:

The reason you're getting this error is because df.withColumn expects a Column object as second argument, whereas you're giving num_pos_values which is an integer.

If you want to assign a literal value to a column (you'll have the same value for each row), you can use the lit function of pyspark.sql.functions.

Something like this works:

df = spark.createDataFrame([("2022", "January"), ("2021", "December")], ["Year", "Month"])

df.show()
 ---- -------- 
|Year|   Month|
 ---- -------- 
|2022| January|
|2021|December|
 ---- -------- 

from pyspark.sql.functions import lit

df.withColumn("testColumn", lit(5)).show()
 ---- -------- ---------- 
|Year|   Month|testColumn|
 ---- -------- ---------- 
|2022| January|         5|
|2021|December|         5|
 ---- -------- ---------- 
  • Related