Home > OS >  Learning Spark: Example with where doesn't work
Learning Spark: Example with where doesn't work


I'm trying to perform example from book Learning Spark.

There is such form of using column in where expression:

val fewFireDF = fireDF
    .select("IncidentNumber", "AvailableDtTm", "CallType")
    .where($"CallType" =!= "Medical Incident")

But IntelliJ Idea doesn't understand $"CallType". It looks like a string.

These variations work well:

.where(col("CallType") =!= "Medical Incident")
.where("CallType != 'Medical Incident'")

UPDATE It seems I didn't clear explain my problem.

Here is my code:

package org.example.chapter3

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.catalyst.dsl.expressions.{DslExpression, StringToAttributeConversionHelper}
import org.apache.spark.sql.types.{BooleanType, FloatType, IntegerType, StringType, StructField, StructType}
import org.apache.spark.sql.functions._

object DepartmentCalls extends App {
  val spark = SparkSession

  if (args.length < 1) {
    println("usage DepartmentCalls <file path to fire_incidents.csv")

  val schema = StructType(
      StructField("CallNumber", IntegerType),
      StructField("UnitID", StringType),
      StructField("IncidentNumber", IntegerType),
      StructField("CallType", StringType),
      StructField("CallDate", StringType),
      StructField("WatchDate", StringType),
      StructField("CallFinalDisposition", StringType),
      StructField("AvailableDtTm", StringType),
      StructField("Address", StringType),
      StructField("City", StringType),
      StructField("Zipcode", IntegerType),
      StructField("Battalion", StringType),
      StructField("StationArea", StringType),
      StructField("Box", StringType),
      StructField("OriginalPriority", StringType),
      StructField("Priority", StringType),
      StructField("FinalPriority", IntegerType),
      StructField("ALSUnit", BooleanType),
      StructField("CallTypeGroup", StringType),
      StructField("NumAlarms", IntegerType),
      StructField("UnitType", StringType),
      StructField("UnitSequenceInCallDispatch", IntegerType),
      StructField("FirePreventionDistrict", StringType),
      StructField("SupervisorDistrict", StringType),
      StructField("Neighborhood", StringType),
      StructField("Location", StringType),
      StructField("RowID", StringType),
      StructField("Delay", FloatType)

  // Read the file using the CSV DataFrameReader
  val sfFireFile= args(0)
  val fireDF = spark.read.schema(schema)
    .option("header", "true")


  val fewFireDF = fireDF
    .select("IncidentNumber", "AvailableDtTm", "CallType")
    .where($"CallType" =!= "Medical Incident")

  fewFireDF.show(5, false)


I have next errors:

  1. Cannot resolve overloaded method 'where'
  2. Type mismatch. Required Expression, Found String - after "Medical Incident"

When I try compile my code I get next error:

[error] /Users/xxxxxxx/Workspace/Learning/Spark/learning-spark/src/main/scala/org/example/chapter3/DepartmentCalls.scala:62:28: type mismatch; [error] found : String("Medical Incident") [error] required: org.apache.spark.sql.catalyst.expressions.Expression [error] .where($"CallType" =!= "Medical Incident") [error]
^ [error] one error found [error] (Compile / compileIncremental) Compilation failed

CodePudding user response:

You're probably missing an import within scope of the call site. The $<column name> shortcut is typically introduced by calling import sparksession.implicits._. Intellij often removes this import if you have 'optimize imports' enabled as it doesn't recognise that it's in use.

  • Related