I have a DF with this data:
-------- ------------------------------------------
|recType |value |
-------- ------------------------------------------
|{"id": 1|{"id": 1, "user_id": 100, "price": 50} |
...
I can filter recType with contains
, but how to do with ===
and quotes? I seem to get some error every time.
CodePudding user response:
I understand that columns here are strings. If so, from_json function can parse them into structure.
import org.apache.spark.sql.types.{StructField, StructType, IntegerType}
import org.apache.spark.sql.functions.from_json
val recTypeSchema = StructType(Array(
StructField("id", IntegerType, true)
))
val valueSchema = StructType(Array(
StructField("id", IntegerType, true),
StructField("user_id", IntegerType, true),
StructField("price", IntegerType, true)
))
val parsedDf = df
.withColumn("recType", from_json($"recType", recTypeSchema))
.withColumn("value", from_json($"value", valueSchema))
parsedDf.printSchema
root
|-- recType: struct (nullable = true)
| |-- id: integer (nullable = true)
|-- value: struct (nullable = true)
| |-- id: integer (nullable = true)
| |-- user_id: integer (nullable = true)
| |-- price: integer (nullable = true)
parsedDf.filter($"recType.id" === 1).show
------- ------------
|recType| value|
------- ------------
| {1}|{1, 100, 50}|
------- ------------