I have a dataframe as below
val data = Seq(
"""{"Data": [{ "name": "FName", "value": "Alex" }, { "name": "LName", "value": "Johnson" }]}""",
"""{"Data": [{ "name": "FName", "value": "Alexis" }, { "name": "LName", "value": "Paul" }]}""",
"""{"Data": [{ "name": "FName", "value": "Alexander" }, { "name": "LName", "value": "Strong" }]}""",
"""{"Data": [{ "name": "FName", "value": "Baron" }, { "name": "LName", "value": "Corbin" }]}""",
)
val df = spark.read.json(spark.sparkContext.parallelize(data))
df.createOrReplaceTempView("df")
Schema as below
root
|-- Data: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- name: string (nullable = true)
| | |-- value: string (nullable = true)
Data output for above df as below
Data
[{"name":"FName","value":"Alex"},{"name":"LName","value":"Johnson"}]
[{"name":"FName","value":"Alexis"},{"name":"LName","value":"Paul"}]
[{"name":"FName","value":"Alexander"},{"name":"LName","value":"Strong"}]
[{"name":"FName","value":"Baron"},{"name":"LName","value":"Corbin"}]
I need all records with Fname that starts with 'Alex'
Expected Output
Data
[{"name":"FName","value":"Alex"},{"name":"LName","value":"Johnson"}]
[{"name":"FName","value":"Alexis"},{"name":"LName","value":"Paul"}]
[{"name":"FName","value":"Alexander"},{"name":"LName","value":"Strong"}]
Spark SQL query 1:
select * from df where array_contains (Data.value, "Al%")
Spark SQL query 2:
select * from df where array_contains (Data.value, "Al*")
Both these queries results in empty.
Spark SQL query 3:
select * from df where array_contains (Data.value, "Alex")
Result:
Data
[{"name":"FName","value":"Alex"},{"name":"LName","value":"Johnson"}]
How can I do Like or regex on array_contains?
CodePudding user response:
Use exists
function instead:
select * from df where exists(Data.value, x -> x like 'Al%')