Home > Enterprise >  Converting to RDD fails
Converting to RDD fails

Time:09-17

My code is as below. I read a CSV file which has two columns. Loop through the elements of the Dataframe by converting to a RDD. Now i wanted to create a DF of each Element. Below code fails. Can anyone please help.

    val df1 = spark.read.format("csv").load("c:\\file.csv") //CSV has 3 columns
     
    for (row <- df1.rdd.collect)
     {
       var tab1 =  row.mkString(",").split(",")(0) //Has Tablename
       var tab2 =  row.mkString(",").split(",")(1) //One Select Statment
       var tab3 =  row.mkString(",").split(",")(1) //Another Select Statment      
       
       val newdf = spark.createDataFrame(tab1).toDF("Col") // This is not working
               
     }
    

I want to join tab2 dataframe with tab3 and append tablename. For example

Exceution of query in tab2 and tab3 gives below result.

Col1     col2
---      ---
A         B
C         D
E         F
G         H

I want as below:

Col0  Col1  Col2
----  ----   ---
Tab1   A      B
Tab1   C      D
Tab2   E      F
Tab3   G      h 

Now tab1 tab2 tab2.. etc this information is in CSV file file which am reading. I want to convert that col0 to a datafram,so that i can read in Spark Sql

CodePudding user response:

I was able to resolve my replacing below:

val newdf = spark.createDataFrame(tab1).toDF("Col") // This is not working

By

val newDf = spark.sparkContext.parallelize(Seq(newdf)).toDF("Col")
  • Related