Home > Back-end >  How to create a dataframe along with schema from the individual values
How to create a dataframe along with schema from the individual values

Time:11-20

i have some individual values with data and i have to convert it into dataframe. and i tried the below . Only one row output will come.

val matchingcount= 3
val notmatchingcount=5
val filename=h:/filename1

import spark.implicits._
val data=Seq(" filename "," matchingcount "," notmatchingcount ").toDF("ezfilename","match_count","non_matchcount")
data.show()

throwing error :

Exception in thread "main" java.lang.IllegalArguementException : requirement failed : the number of columns doesn't match.
Old column names (1): value
New column names (8) : ezfilename,match_count,non_matchcount

Any help please

CodePudding user response:

You were almost there! The code that does what you want is the following:

val matchingcount= 3
val notmatchingcount=5
val filename="h:/filename1"

import spark.implicits._
val data=Seq((filename,matchingcount,notmatchingcount)).toDF("ezfilename","match_count","non_matchcount")
data.show()

 ------------ ----------- -------------- 
|  ezfilename|match_count|non_matchcount|
 ------------ ----------- -------------- 
|h:/filename1|          3|             5|
 ------------ ----------- -------------- 

There are 3 key differences between your code and the code above here:

  • In scala, a string has to be surrounded by " characters. So I've added these characters to val filename=
  • You were correct in the fact that you could use a Seq to use the toDF method after imports spark.implicits._, but each element of the string would represent one row of the dataframe. So instead of creating a dataframe with 3 columns you were creating one with 1 element. The way you can create 3 columns is by adding tuples inside of your Seq. So notice the difference between Seq(bla,bla,bla) and Seq((bla, bla, bla)) where the latter is the correct one. You can also create multiple rows like this by doing: Seq((bla, bli, blu), (blo, ble, bly)).
  • In Scala, the way you access a variable's value is by simply writing the variable's name. So writing filename instead of " filename " is the correct way of doing that.

Hope this helps!

  • Related