Home > Net >  Spark Sql Array contains on Regex - doesn't work
Spark Sql Array contains on Regex - doesn't work

Time:03-24

I have a dataframe as below

val data = Seq(
    """{"Data": [{ "name": "FName", "value": "Alex" }, { "name": "LName",   "value": "Johnson"  }]}""",
    """{"Data": [{ "name": "FName", "value": "Alexis" }, { "name": "LName",   "value": "Paul"  }]}""",
    """{"Data": [{ "name": "FName", "value": "Alexander" }, { "name": "LName",   "value": "Strong"  }]}""",
    """{"Data": [{ "name": "FName", "value": "Baron" }, { "name": "LName",   "value": "Corbin"  }]}""",
)
val df = spark.read.json(spark.sparkContext.parallelize(data))
df.createOrReplaceTempView("df")

Schema as below

 root
 |-- Data: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- value: string (nullable = true)

Data output for above df as below

Data
[{"name":"FName","value":"Alex"},{"name":"LName","value":"Johnson"}]
[{"name":"FName","value":"Alexis"},{"name":"LName","value":"Paul"}]
[{"name":"FName","value":"Alexander"},{"name":"LName","value":"Strong"}]
[{"name":"FName","value":"Baron"},{"name":"LName","value":"Corbin"}]

I need all records with Fname that starts with 'Alex'

Expected Output

Data
[{"name":"FName","value":"Alex"},{"name":"LName","value":"Johnson"}]
[{"name":"FName","value":"Alexis"},{"name":"LName","value":"Paul"}]
[{"name":"FName","value":"Alexander"},{"name":"LName","value":"Strong"}]

Spark SQL query 1:

select * from df where array_contains (Data.value, "Al%")

Spark SQL query 2:

select * from df where array_contains (Data.value, "Al*")

Both these queries results in empty.

Spark SQL query 3:

select * from df where array_contains (Data.value, "Alex")

Result:

Data
[{"name":"FName","value":"Alex"},{"name":"LName","value":"Johnson"}]

How can I do Like or regex on array_contains?

CodePudding user response:

Use exists function instead:

select * from df where exists(Data.value, x -> x like 'Al%')
  • Related