Home > Blockchain >  How to understand `e:` and `columnsName` in spark error message when using window function?
How to understand `e:` and `columnsName` in spark error message when using window function?

Time:10-15

I have a very simple code like

val win = Window.partitionBy("app").orderBy("date")
val appSpendChange = appSpend
  .withColumn("prevSpend", lag(col("Spend")).over(win))
  .withColumn("spendChange", when(isnull($"Spend" - "prevSpend"), 0)
              .otherwise($"spend" - "prevSpend"))
display(appSpendChange)

This should work as I am referring a PySpark example from and change it to scala :Pyspark Column Transformation: Calculate Percentage Change for Each Group in a Column

However, I get this error:

error: overloaded method value lag with alternatives:
  (e: org.apache.spark.sql.Column,offset: Int,defaultValue: Any,ignoreNulls: Boolean)org.apache.spark.sql.Column <and>
  (e: org.apache.spark.sql.Column,offset: Int,defaultValue: Any)org.apache.spark.sql.Column <and>
  (columnName: String,offset: Int,defaultValue: Any)org.apache.spark.sql.Column <and>
  (columnName: String,offset: Int)org.apache.spark.sql.Column <and>
  (e: org.apache.spark.sql.Column,offset: Int)org.apache.spark.sql.Column
     cannot be applied to (org.apache.spark.sql.Column)

  .withColumn("prevPctSpend", lag(col("pctCtvSpend")).over(win))
                          ^

How should I understand it? Especially the e: annotation? Thanks and appreciate any feedback.

CodePudding user response:

You should understand this error as following:

  • there are 5 methods lag defined with following parameters and return type ((<parameters>)<return>:
    • (e: org.apache.spark.sql.Column,offset: Int,defaultValue: Any,ignoreNulls: Boolean)org.apache.spark.sql.Column
    • (e: org.apache.spark.sql.Column,offset: Int,defaultValue: Any)org.apache.spark.sql.Column
    • (columnName: String,offset: Int,defaultValue: Any)org.apache.spark.sql.Column
    • (columnName: String,offset: Int)org.apache.spark.sql.Column
    • (e: org.apache.spark.sql.Column,offset: Int)org.apache.spark.sql.Column
  • none of these possibilities can be applied with parameters of types (org.apache.spark.sql.Column) (the code you wrote)

In the end it means you called a method with missing or invalid parameters.

As @Dima said, you likely want to add a second parameter (offset) to your call to lag.

  • Related