Home > Software engineering >  In Spark SQL, transform JSON key name into value
In Spark SQL, transform JSON key name into value

Time:11-09

It seems like there should be a function for this in Spark SQL similar to pivoting, but I haven't found any solution to transforming a JSON key into a a value. Suppose I have a badly formed JSON (the format of which I cannot change):

{"A long string containing serverA": {"x": 1, "y": 2}}

how can I process it to

{"server": "A", "x": 1, "y": 2}

?

I read the JSONs into an an sql.dataframe and would then like to process them as described above:

val cs = spark.read.json("sample.json")
  .???

CodePudding user response:

If we want to use only spark functions and no UDFs, you could use from_json to parse the json into a map (we need to specify a schema). Then you just need to extract the information with spark functions. One way to do it is as follows:

val schema = MapType(
    StringType,
    StructType(Array(
        StructField("x", IntegerType),
        StructField("y", IntegerType)
    ))
)

spark.read.text("...")
    .withColumn("json", from_json('value, schema))
    .withColumn("key", map_keys('json).getItem(0))
    .withColumn("value", map_values('json).getItem(0))
    .withColumn("server",
        // Extracting the server name with a regex
        regexp_replace(regexp_extract('key, "server[^ ]*", 0), "server", ""))
    .select("server", "value.*")
    .show(false)

which yields:

 ------ --- --- 
|server|x  |y  |
 ------ --- --- 
|A     |1  |2  |
 ------ --- --- 
  • Related