Home > other >  filter string with quotes in Spark dataframe column
filter string with quotes in Spark dataframe column

Time:02-27

I have a DF with this data:

-------- ------------------------------------------ 
|recType |value                                     |
 -------- ------------------------------------------ 
|{"id": 1|{"id": 1, "user_id": 100, "price": 50}    |
...

I can filter recType with contains, but how to do with === and quotes? I seem to get some error every time.

CodePudding user response:

I understand that columns here are strings. If so, from_json function can parse them into structure.

import org.apache.spark.sql.types.{StructField, StructType, IntegerType}
import org.apache.spark.sql.functions.from_json

val recTypeSchema = StructType(Array(
    StructField("id", IntegerType, true)
))
val valueSchema = StructType(Array(
    StructField("id", IntegerType, true),
    StructField("user_id", IntegerType, true),
    StructField("price", IntegerType, true)
))

val parsedDf = df
    .withColumn("recType", from_json($"recType", recTypeSchema))
    .withColumn("value", from_json($"value", valueSchema))

parsedDf.printSchema
root
 |-- recType: struct (nullable = true)
 |    |-- id: integer (nullable = true)
 |-- value: struct (nullable = true)
 |    |-- id: integer (nullable = true)
 |    |-- user_id: integer (nullable = true)
 |    |-- price: integer (nullable = true)


parsedDf.filter($"recType.id" === 1).show
 ------- ------------ 
|recType|       value|
 ------- ------------ 
|    {1}|{1, 100, 50}|
 ------- ------------ 
  • Related