Home > Software engineering >  Spark SQL how to query columns with nested Json
Spark SQL how to query columns with nested Json

Time:10-07

I have a column features in a table with the follwing structure

 |-- features: struct (nullable = true)
 |    |-- tectonFeatures: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- featureName: string (nullable = true)
 |    |    |    |-- results: array (nullable = true)
 |    |    |    |    |-- element: string (containsNull = true)

How do I write a SQL query that gets me any row where featureName = 'a' and results is not empty ?

CodePudding user response:

With this schema:

root
 |-- features: struct (nullable = true)
 |    |-- tectonFeatures: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- featureName: string (nullable = true)
 |    |    |    |-- results: array (nullable = true)
 |    |    |    |    |-- element: string (containsNull = true)

and these rows:

 -------------------- 
|features            |
 -------------------- 
|{[{a, [Something]}]}|
|{[{b, []}]}         |
 -------------------- 

this SQL statement keeps any row that contains at least one pair of featureName = a and size(results) > 0:

select * from test where size(filter(features.tectonFeatures, x -> x.featureName = 'a' and size(x.results) > 0)) > 0

Final result:

 -------------------- 
|features            |
 -------------------- 
|{[{a, [Something]}]}|
 -------------------- 

Good luck!

  • Related