I have a column features
in a table with the follwing structure
|-- features: struct (nullable = true)
| |-- tectonFeatures: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- featureName: string (nullable = true)
| | | |-- results: array (nullable = true)
| | | | |-- element: string (containsNull = true)
How do I write a SQL query that gets me any row where featureName = 'a'
and results is not empty
?
CodePudding user response:
With this schema:
root
|-- features: struct (nullable = true)
| |-- tectonFeatures: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- featureName: string (nullable = true)
| | | |-- results: array (nullable = true)
| | | | |-- element: string (containsNull = true)
and these rows:
--------------------
|features |
--------------------
|{[{a, [Something]}]}|
|{[{b, []}]} |
--------------------
this SQL statement keeps any row that contains at least one pair of featureName = a
and size(results) > 0
:
select * from test where size(filter(features.tectonFeatures, x -> x.featureName = 'a' and size(x.results) > 0)) > 0
Final result:
--------------------
|features |
--------------------
|{[{a, [Something]}]}|
--------------------
Good luck!