Home > Software engineering >  How to write JSON string to parquet, avro file in scala without spark
How to write JSON string to parquet, avro file in scala without spark

Time:02-10

I am want to write simple JSON string to parquet and avro file format in scala without spark framework.

My JSON string looks:

    {"emp_id":"123","emp_name":"Mike","emp_status":"true"}

I did not find any solution for that, Is it possible to write parquet and avro file from simple JSON string in scala without spark framework??

CodePudding user response:

Here is a example

build.sbt

ThisBuild / version := "0.1.0-SNAPSHOT"

ThisBuild / scalaVersion := "2.13.8"

lazy val root = (project in file("."))
  .settings(
    name := "parquet",
    libraryDependencies   = Seq(
      "com.github.mjakubowski84" %% "parquet4s-core" % "2.1.0",
      "org.apache.hadoop" % "hadoop-client" % "3.3.1"
    ),
  )

Test.scala

import com.github.mjakubowski84.parquet4s.{ ParquetReader, ParquetWriter, Path }

object Test {
  def main(args: Array[String]): Unit = {
    case class Emp(emp_id: String, emp_name: String, emp_status: String)

    val emps = Seq(
      Emp("123", "Mike", "true")
    )

    val path = Path("emp1.parquet")


    ParquetWriter.of[Emp].writeAndClose(path, emps)

    val parquetIterable = ParquetReader.as[Emp].read(path)
    try {
      parquetIterable.foreach(println)
    } finally parquetIterable.close()
  }
}

And the output

Emp(123,Mike,true)
  • Related