Home > OS >  Pattern matching json lines using Circe and filtering based upon decoded case class value
Pattern matching json lines using Circe and filtering based upon decoded case class value

Time:07-30

I have a very large file of json lines, which I intend to read into a list of case classes. Due to the size of the file, rather than reading the entire file into a variable first and then filtering, I would like to filter within the json decoding pattern matching. Currently the code looks like this:

import io.circe.Decoder
import io.circe.generic.semiauto.deriveDecoder
import io.circe.parser.decode

case class Person(name: String, age: Int, country: String)

val personList: List[Person] =
    Source.fromResource("Persons.json").getLines.toList.map { line =>
      implicit val jsonDecoder: Decoder[Person] = deriveDecoder[Person]
      val decoded = decode[Person](line)
      decoded match {
        case Right(decodedJson) =>
          Person(
            decodedJson.name,
            decodedJson.age,
            decodedJson.country
          )
        case Left(ex) => throw new RuntimeException(ex)
      }
    }

however, if I wanted to only include Person instances with a country of "us", what would be the best way to accomplish this? Should I have nested pattern matching, that will specifically look for Person(_, _, "us") (im not sure how I would accomplish this), or is there some way I can implement Option handling?

CodePudding user response:

You could do something like this:

import io.circe.Decoder
import io.circe.generic.semiauto.deriveDecoder
import io.circe.parser.decode

case class Person(name: String, age: Int, country: String)

implicit val jsonDecoder: Decoder[Person] = deriveDecoder[Person]

val personList: List[Person] =
  Source
    .fromResource("Persons.json")
    .getLines
    .flatMap { line =>
      val decoded = decode[Person](line)
      decoded match {
        case Right(person @ Person(_, _, "us")) => Some(person)
        case Right(_)                           => None
        case Left(ex) =>
          println(s"couldn't decode: $line, will skip (error: ${ex.getMessage})")
          None
      }
    }
    .toList

println(s"US people: $personList")

A few things to note:

  • I moved the .toList to the end. In your implementation, you called it right after .getLines which kind of loses the lazyness of the whole thing. Assuming there's only a few US people out of huge number of people in the JSON file, this can be beneficial for performance & efficiency.
  • Wrapping each iteration's result in an Option along with flatMap over the original Iterator we're running upon is very helpful to get this kind collection filtering.
  • I didn't throw an exception upon an error, but rather logged it and moved on with a None. You could also accumulate errors and do whatever you want with them after all iterations are done, if that's helpful to you.
  • The @ in person @ Person(_, _, "us") can be used for something like "match & bind" upon the whole object in question.
  • As the comment to the original question noted - no need to re-instantiate the implicit Decoder upon each iteration. You can just pull it one layer up, as I did in my example.
  • Related