Home > other >  Extract JSON objects using Scala (Have Duplicates Keys)
Extract JSON objects using Scala (Have Duplicates Keys)

Time:11-30

Have a sample JSON like below which has duplicate keys with field context :

 {
    "Production": {
        "meta_id": "1239826",
        "endeca_id": "EN29826",
        "Title": "Use The Google Home ™ To Choose The Right CCCM Solution For Your Needs",
        "Subtitle": null,
        "context": {
            "researchID": "22",
            "researchtitle": " The Google Home ™: Cross-Channel , Q4 2019",
            "telconfdoclinkid": null
        },
        "context": {
            "researchID": "281",
            "researchtitle": " The Google Home ™: Cross-Channel  Q3 2019",
            "telconfdoclinkid": null
        },
        "context": {
            "researchID": "154655",
            "researchtitle": " Now Tech: Cross-Channel Campaign Management, Q2 2019",
            "telconfdoclinkid": null
        },
        "uri": "/doc/uri",
        "ssd": "ihdfiuhdl",
        "id": "dsadfsd221e"
     }
 }

When I am parsing the JSON for field "context" in scala , it's reject the JSON with a parsing error as below.

Exception in thread "main" org.json.JSONException: Duplicate key "context".

Could you suggest best approach to parse a json in above format using scala.

CodePudding user response:

Some JSON parsers for Scala that parse from JSON bytes to your data structures can parse duplicated keys using custom codecs.

Below is an example how it can be done with jsoniter-scala:

Add dependencies to your build.sbt:

libraryDependencies   = Seq(
  // Use the %%% operator instead of %% for Scala.js  
  "com.github.plokhotnyuk.jsoniter-scala" %% "jsoniter-scala-core"   % "2.12.0",
  // Use the "provided" scope instead when the "compile-internal" scope is not supported  
  "com.github.plokhotnyuk.jsoniter-scala" %% "jsoniter-scala-macros" % "2.12.0" % "compile-internal"
)

Use data structures and the custom codec from the following snippet:

import com.github.plokhotnyuk.jsoniter_scala.macros._
import com.github.plokhotnyuk.jsoniter_scala.core._

object Example01 {
  case class Context(researchID: String, researchtitle: String, telconfdoclinkid: Option[String])

  sealed trait Issue

  case class Production(
    meta_id: String,
    endeca_id: String,
    Title: String,
    Subtitle: Option[String],
    contexts: List[Context],
    uri: String,
    ssd: String,
    id: String) extends Issue

  implicit val contextCodec: JsonValueCodec[Context] = JsonCodecMaker.make
  implicit val productionCodec: JsonValueCodec[Production] =
    new JsonValueCodec[Production] {
      def nullValue: Production = null

      def decodeValue(in: JsonReader, default: Production): Production = if (in.isNextToken('{')) {
        var _meta_id: String = null
        var _endeca_id: String = null
        var _Title: String = null
        var _Subtitle: Option[String] = None
        val _contexts = List.newBuilder[Context]
        var _uri: String = null
        var _ssd: String = null
        var _id: String = null
        var p0 = 255
        if (!in.isNextToken('}')) {
          in.rollbackToken()
          var l = -1
          while (l < 0 || in.isNextToken(',')) {
            l = in.readKeyAsCharBuf()
            if (in.isCharBufEqualsTo(l, "meta_id")) {
              if ((p0 & 1) != 0 ) p0 ^= 1
              else in.duplicatedKeyError(l)
              _meta_id = in.readString(_meta_id)
            } else if (in.isCharBufEqualsTo(l, "endeca_id")) {
              if ((p0 & 2) != 0) p0 ^= 2
              else in.duplicatedKeyError(l)
              _endeca_id = in.readString(_endeca_id)
            } else if (in.isCharBufEqualsTo(l, "Title")) {
              if ((p0 & 4) != 0) p0 ^= 4
              else in.duplicatedKeyError(l)
              _Title = in.readString(_Title)
            } else if (in.isCharBufEqualsTo(l, "Subtitle")) {
              if ((p0 & 8) != 0) p0 ^= 8
              else in.duplicatedKeyError(l)
              _Subtitle =
                if (in.isNextToken('n')) in.readNullOrError(_Subtitle, "expected value or null")
                else {
                  in.rollbackToken()
                  new Some(in.readString(null))
                }
            } else if (in.isCharBufEqualsTo(l, "context")) {
              p0 &= ~16
              _contexts  = contextCodec.decodeValue(in, contextCodec.nullValue)
            } else if (in.isCharBufEqualsTo(l, "uri")) {
              if ((p0 & 32) != 0) p0 ^= 32
              else in.duplicatedKeyError(l)
              _uri = in.readString(_uri)
            } else if (in.isCharBufEqualsTo(l, "ssd")) {
              if ((p0 & 64) != 0) p0 ^= 64
              else in.duplicatedKeyError(l)
              _ssd = in.readString(_ssd)
            } else if (in.isCharBufEqualsTo(l, "id")) {
              if ((p0 & 128) != 0) p0 ^= 128
              else in.duplicatedKeyError(l)
              _id = in.readString(_id)
            } else in.skip()
          }
          if (!in.isCurrentToken('}')) in.objectEndOrCommaError()
        }
        if ((p0 & 247) != 0) in.requiredFieldError(f0(java.lang.Integer.numberOfTrailingZeros(p0 & 247)))
        new Production(meta_id = _meta_id, endeca_id = _endeca_id, Title = _Title, Subtitle = _Subtitle, contexts = _contexts.result(), uri = _uri, ssd = _ssd, id = _id)
      } else in.readNullOrTokenError(default, '{')

      def encodeValue(x: Production, out: JsonWriter): Unit = {
        out.writeObjectStart()
        out.writeNonEscapedAsciiKey("meta_id")
        out.writeVal(x.meta_id)
        out.writeNonEscapedAsciiKey("endeca_id")
        out.writeVal(x.endeca_id)
        out.writeNonEscapedAsciiKey("Title")
        out.writeVal(x.Title)
        x.Subtitle match {
          case Some(s) =>
            out.writeNonEscapedAsciiKey("Subtitle")
            out.writeVal(s)
        }
        x.contexts.foreach { c =>
          out.writeNonEscapedAsciiKey("context")
          contextCodec.encodeValue(c, out)
        }
        out.writeNonEscapedAsciiKey("uri")
        out.writeVal(x.uri)
        out.writeNonEscapedAsciiKey("ssd")
        out.writeVal(x.ssd)
        out.writeNonEscapedAsciiKey("id")
        out.writeVal(x.id)
        out.writeObjectEnd()
      }

      private[this] def f0(i: Int): String = ((i: @annotation.switch): @unchecked) match {
        case 0 => "meta_id"
        case 1 => "endeca_id"
        case 2 => "Title"
        case 3 => "Subtitle"
        case 4 => "context"
        case 5 => "uri"
        case 6 => "ssd"
        case 7 => "id"
      }
    }
  implicit val issueCodec: JsonValueCodec[Issue] = JsonCodecMaker.make(CodecMakerConfig.withDiscriminatorFieldName(None))

  def main(args: Array[String]): Unit = {
    val issue = readFromArray[Issue](
      """
        | {
        |    "Production": {
        |        "meta_id": "1239826",
        |        "endeca_id": "EN29826",
        |        "Title": "Use The Google Home &trade To Choose The Right CCCM Solution For Your Needs",
        |        "Subtitle": null,
        |        "context": {
        |            "researchID": "22",
        |            "researchtitle": " The Google Home ™: Cross-Channel , Q4 2019",
        |            "telconfdoclinkid": null
        |        },
        |        "context": {
        |            "researchID": "281",
        |            "researchtitle": " The Google Home ™: Cross-Channel  Q3 2019",
        |            "telconfdoclinkid": null
        |        },
        |        "context": {
        |            "researchID": "154655",
        |            "researchtitle": " Now Tech: Cross-Channel Campaign Management, Q2 2019",
        |            "telconfdoclinkid": null
        |        },
        |        "uri": "/doc/uri",
        |        "ssd": "ihdfiuhdl",
        |        "id": "dsadfsd221e"
        |     }
        | }
        |""".stripMargin.getBytes("UTF-8"))
    println(issue)
  }
}

Expected output:

Production(1239826,EN29826,Use The Google Home &trade To Choose The Right CCCM Solution For Your Needs,None,List(Context(22, The Google Home ™: Cross-Channel , Q4 2019,None), Context(281, The Google Home ™: Cross-Channel  Q3 2019,None), Context(154655, Now Tech: Cross-Channel Campaign Management, Q2 2019,None)),/doc/uri,ihdfiuhdl,dsadfsd221e)

CodePudding user response:

Json4s can parse duplicate keys:

scala> import org.json4s.native.JsonMethods._
import org.json4s.native.JsonMethods._

scala> parse("""{ "hello": true, "context": { "value": "A"}, "context": { "value": "B" }}""")
res2: org.json4s.JValue = JObject(List((hello,JBool(true)), (context,JObject(List((value,JString(A))))), (context,JObject(List((value,JString(B)))))))

Here's the documentation for json4s

  • Related