Home > database >  Spark convert json data to DataFrame using Scala
Spark convert json data to DataFrame using Scala

Time:04-02

Input one.txt file

[{"a":1,"b":2,"c":3}, {"a":11,"b":12,"c":13},{"a":1,"b":2,"c":3}]

Expected out:

a       b        c
1,11,1  2,12,1   3,13,3

Could you please provide the solution in a Spark dataFrame using scala?

val spark = SparkSession.builder().appName("JSON_Sample").master("local[1]") getOrCreate()
val data = """[{"a":1,"b":2,"c":3}, {"a":11,"b":12,"c":13},{"a":1,"b":2,"c":3}]"""    //one.txt
val df = spark.read.text("./src/main/scala/resources/text/one.txt").toDF()

CodePudding user response:

This is python version of running code with spark. if you are able to convert it then it is fine otherwise let me know i will do it.

df = spark.read.json(sc.parallelize([{"a":1,"b":2,"c":3},{"a":11,"b":12,"c":13},{"a":1,"b":2,"c":3}]))
df.show()
 --- --- --- 
|  a|  b|  c|
 --- --- --- 
|  1|  2|  3|
| 11| 12| 13|
|  1|  2|  3|
 --- --- --- 

df.agg(*[concat_ws(",",collect_list(col(i))).alias(i) for i in df.columns]).show()
 ------ ------ ------ 
|     a|     b|     c|
 ------ ------ ------ 
|1,11,1|2,12,2|3,13,3|
 ------ ------ ------ 

For scala Spark :

import spark.implicits._ 
val spark = SparkSession.builder().appName("JSON_Sample").master("local[1]") getOrCreate() 
val jsonStr = """[{"a":1,"b":2,"c":3}, {"a":11,"b":12,"c":13},{"a":1,"b":2,"c":3}]""" 

val df= spark.read.json(spark.createDataset(jsonStr :: Nil)) 

val exprs = df.columns.map((_ -> "collect_list")).toMap df.agg(exprs).show()
  • Related