My code is as below. I read a CSV file which has two columns. Loop through the elements of the Dataframe by converting to a RDD. Now i wanted to create a DF of each Element. Below code fails. Can anyone please help.
val df1 = spark.read.format("csv").load("c:\\file.csv") //CSV has 3 columns
for (row <- df1.rdd.collect)
{
var tab1 = row.mkString(",").split(",")(0) //Has Tablename
var tab2 = row.mkString(",").split(",")(1) //One Select Statment
var tab3 = row.mkString(",").split(",")(1) //Another Select Statment
val newdf = spark.createDataFrame(tab1).toDF("Col") // This is not working
}
I want to join tab2 dataframe with tab3 and append tablename. For example
Exceution of query in tab2 and tab3 gives below result.
Col1 col2
--- ---
A B
C D
E F
G H
I want as below:
Col0 Col1 Col2
---- ---- ---
Tab1 A B
Tab1 C D
Tab2 E F
Tab3 G h
Now tab1 tab2 tab2.. etc this information is in CSV file file which am reading. I want to convert that col0 to a datafram,so that i can read in Spark Sql
CodePudding user response:
I was able to resolve my replacing below:
val newdf = spark.createDataFrame(tab1).toDF("Col") // This is not working
By
val newDf = spark.sparkContext.parallelize(Seq(newdf)).toDF("Col")