I have the following code:
class MyTest extends AnyFlatSpec with Matchers {
....
it should "calculate" in {
val testDf= Seq(
testDf(1, "customer1", "Hi"),
testDf(1, "customer2", "Hi")
).toDS().toDF()
val out = MyClass.procOut(spark, testDf)
out.count() should be(1)
out.where(col("customer_id")==="customer1").first().getString(output.first().fieldIndex("json_col")) should be(?) // Here I have problem!
}
}
My issue:
out
is dataframe the:
out.where(col("customer_id")==="customer1").first().getString(output.first().fieldIndex("json_col"))
extract the json column so it has:
{
"statistics": {
"Group2": {
"buy": 1
}
}
}
my goal is to get the value of buy
e.g to assert that the value in the json is 1.
so I want to do something like:
out.where(col("customer_id")==="customer1").first().getString(output.first().fieldIndex("json_col")["statistics"]["Group2"]["buy"]) should be (1)
obviously this is not the right syntax but I can't find what is the right syntax to extract specific value from json.
CodePudding user response:
json_tuple
- Extract the Data from JSON and create them as a new columns.
val jsonString="""{"Zipcode":704,"ZipCodeType":"STANDARD","City":"PARC PARQUE","State":"PR"}"""
val data = Seq((1, jsonString))
import spark.implicits._
val df=data.toDF("id","value")
import org.apache.spark.sql.functions.{json_tuple}
df.select(col("id"),json_tuple(col("value"),"Zipcode","ZipCodeType","City"))
.toDF("id","Zipcode","ZipCodeType","City")
.show(false)
get_json_object
– Extracts JSON element from a JSON string based on json path specified.
import org.apache.spark.sql.functions.{get_json_object}
df.select(col("id"),get_json_object(col("value"),"$.ZipCodeType").as("ZipCodeType"))
.show(false)