I'm trying to perform example from book Learning Spark.
There is such form of using column in where expression:
val fewFireDF = fireDF
.select("IncidentNumber", "AvailableDtTm", "CallType")
.where($"CallType" =!= "Medical Incident")
But IntelliJ Idea doesn't understand $"CallType"
. It looks like a string.
These variations work well:
.where(col("CallType") =!= "Medical Incident")
.where("CallType != 'Medical Incident'")
UPDATE It seems I didn't clear explain my problem.
Here is my code:
package org.example.chapter3
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.catalyst.dsl.expressions.{DslExpression, StringToAttributeConversionHelper}
import org.apache.spark.sql.types.{BooleanType, FloatType, IntegerType, StringType, StructField, StructType}
import org.apache.spark.sql.functions._
object DepartmentCalls extends App {
val spark = SparkSession
.builder
.appName("DepartmentCalls")
.getOrCreate()
if (args.length < 1) {
println("usage DepartmentCalls <file path to fire_incidents.csv")
System.exit(1)
}
val schema = StructType(
Array(
StructField("CallNumber", IntegerType),
StructField("UnitID", StringType),
StructField("IncidentNumber", IntegerType),
StructField("CallType", StringType),
StructField("CallDate", StringType),
StructField("WatchDate", StringType),
StructField("CallFinalDisposition", StringType),
StructField("AvailableDtTm", StringType),
StructField("Address", StringType),
StructField("City", StringType),
StructField("Zipcode", IntegerType),
StructField("Battalion", StringType),
StructField("StationArea", StringType),
StructField("Box", StringType),
StructField("OriginalPriority", StringType),
StructField("Priority", StringType),
StructField("FinalPriority", IntegerType),
StructField("ALSUnit", BooleanType),
StructField("CallTypeGroup", StringType),
StructField("NumAlarms", IntegerType),
StructField("UnitType", StringType),
StructField("UnitSequenceInCallDispatch", IntegerType),
StructField("FirePreventionDistrict", StringType),
StructField("SupervisorDistrict", StringType),
StructField("Neighborhood", StringType),
StructField("Location", StringType),
StructField("RowID", StringType),
StructField("Delay", FloatType)
)
)
// Read the file using the CSV DataFrameReader
val sfFireFile= args(0)
val fireDF = spark.read.schema(schema)
.option("header", "true")
.csv(sfFireFile)
println(fireDF.count())
val fewFireDF = fireDF
.select("IncidentNumber", "AvailableDtTm", "CallType")
.where($"CallType" =!= "Medical Incident")
fewFireDF.show(5, false)
}
I have next errors:
- Cannot resolve overloaded method 'where'
- Type mismatch. Required Expression, Found String - after "Medical Incident"
When I try compile my code I get next error:
[error] /Users/xxxxxxx/Workspace/Learning/Spark/learning-spark/src/main/scala/org/example/chapter3/DepartmentCalls.scala:62:28: type mismatch; [error] found : String("Medical Incident") [error] required: org.apache.spark.sql.catalyst.expressions.Expression [error] .where($"CallType" =!= "Medical Incident") [error]
^ [error] one error found [error] (Compile / compileIncremental) Compilation failed
CodePudding user response:
You're probably missing an import within scope of the call site. The $<column name>
shortcut is typically introduced by calling import sparksession.implicits._
. Intellij often removes this import if you have 'optimize imports' enabled as it doesn't recognise that it's in use.