Home > front end >  Any workaround for JSONPATH wildcard not supported in Spark SQL
Any workaround for JSONPATH wildcard not supported in Spark SQL

Time:02-19

spark.sql("""select get_json_object('{"k":{"value":"abc"}}', '$.*.value') as j""").show()

This results in null while it should return 'abc'. It works if I replace * with k.

I'm aware of limited JSONPath support (https://cwiki.apache.org/confluence/display/Hive/LanguageManual UDF#LanguageManualUDF-get_json_object)

but is there a way to achieve this is Spark SQL.

CodePudding user response:

There is a Spark JIRA "Any depth search not working in get_json_object ($..foo)" open for full JsonPath support.

Until it is resolved, I'm afraid creating a UDF that uses a "general-purpose" JsonPath implementation might be the one and only option:

> spark-shell --jars "json-path-2.7.0.jar"                      
:
scala> import com.jayway.jsonpath.JsonPath
import com.jayway.jsonpath.JsonPath

scala> val r = JsonPath.read[net.minidev.json.JSONArray]("""
     | {"k":{"value":"abc"},
     |  "m":{"value":"def"}}
     | ""","$.*.value")
r: net.minidev.json.JSONArray = ["abc","def"]

scala> 
  • Related