Home > database >  How to read field from nested json?
How to read field from nested json?

Time:01-01

this is my test json file.

{
  "item" : {
    "fracData" : [ ],
    "fractimeData" : [ {
      "number" : "1232323232",
      "timePeriods" : [ {
        "validFrom" : "2021-08-03"
      } ]
    } ],
    "Module" : [ ]
  }
}

This is how I read the json file.

starhist_test_df = spark.read.json("/mapr/xxx/yyy/ttt/dev/rawdata/Test.json", multiLine=True)
starhist_test_df.createOrReplaceTempView("v_test_df")

This query works.

df_test_01 = spark.sql("""
 select item.fractimeData.number from v_test_df""")
df_test_01.collect();

Result

[Row(number=['1232323232'])]

But this query doesn't work.

df_test_01 = spark.sql("""
 select item.fractimeData.timePeriods.validFrom from v_test_df""")
df_test_01.collect();

Error

cannot resolve 'v_test_df.`item`.`fractimeData`.`timePeriods`['validFrom']' due to data type mismatch: argument 2 requires integral type, however, ''validFrom'' is of string type.; line 3 pos 0;

What do I have to change, to read the validFrom field?

CodePudding user response:

try built in function "transform".

It will resolve your query

CodePudding user response:

dot notation to access values works with struct or array<struct> types.

The schema for field number in item.fractimeData is string and accessing it via dot notation returns an array<string> since fractimeData is an array.

Similarly, the schema for field timePeriods in item.fractimeData is <array<struct<validFrom>>, and accessing it via dot notation wraps it into another array, resulting in final schema of array<array<struct<validFrom>>>.

The error you get is because the dot notation can work on array<struct> but not on array<array>.

Hence, flatten the result from item.fractimeData.timePeriods to get back an array<struct<validFrom>> and then apply the dot notation.

df_test_01 = spark.sql("""
 select flatten(item.fractimeData.timePeriods).validFrom as validFrom from v_test_df""")
df_test_01.collect()

"""
[Row(validFrom=['2021-08-03', '2021-08-03'])]
"""
  • Related