Home > Mobile >  spark scala "Overloaded method value select with alternatives" when trying to get the max
spark scala "Overloaded method value select with alternatives" when trying to get the max

Time:12-12

df.show
 ------- ----- ---------- -------- ------------ ---- 
|     id|  val|      date|    time| use        |flag|
 ------- ----- ---------- -------- ------------ ---- 
|8200732|    1|2015-01-06|11:48:30|30065.221532|   0|
|8200733|    1|2015-01-06|11:48:40|30065.225763|   0|
|8200734|    1|2015-01-06|11:48:50|30065.229994|   0|
|8200735|    1|2015-01-06|11:49:00|30065.234225|   0|

I am trying to get the average use for each date value. Here is what I try:

 df.select("date",max($"use")).show()
<console>:26: error: overloaded method value select with alternatives:
  [U1, U2](c1: org.apache.spark.sql.TypedColumn[org.apache.spark.sql.Row,U1], c2: org.apache.spark.sql.TypedColumn[org.apache.spark.sql.Row,U2])org.apache.spark.sql.Dataset[(U1, U2)] <and>
  (col: String,cols: String*)org.apache.spark.sql.DataFrame <and>
  (cols: org.apache.spark.sql.Column*)org.apache.spark.sql.DataFrame
 cannot be applied to (String, org.apache.spark.sql.Column)

I am not sure what I am doing wrong, I have tried to re-write this many times but each time I get an error. I can get the max value for just the use but trying to get the max value of use for each date is causing me issues.

I can not use SparkSQL or pySpark for this.

CodePudding user response:

That's because you're not using either of the overloaded options for select method on dataframe. The one that you're using is:

df.select("date",max($"use")).show()

And if you notice, "date" is a String literal, while max($"user") is a Column. You should try to use the date column instead of literal date string:

// notice the $ before date here
df.select($"date",max($"use")).show()

CodePudding user response:

Here is what you should do to get the average use for each date value:

df.groupBy("date").agg(mean("use")).show()
  • Related