i have some individual values with data and i have to convert it into dataframe. and i tried the below . Only one row output will come.
val matchingcount= 3
val notmatchingcount=5
val filename=h:/filename1
import spark.implicits._
val data=Seq(" filename "," matchingcount "," notmatchingcount ").toDF("ezfilename","match_count","non_matchcount")
data.show()
throwing error :
Exception in thread "main" java.lang.IllegalArguementException : requirement failed : the number of columns doesn't match.
Old column names (1): value
New column names (8) : ezfilename,match_count,non_matchcount
Any help please
CodePudding user response:
You were almost there! The code that does what you want is the following:
val matchingcount= 3
val notmatchingcount=5
val filename="h:/filename1"
import spark.implicits._
val data=Seq((filename,matchingcount,notmatchingcount)).toDF("ezfilename","match_count","non_matchcount")
data.show()
------------ ----------- --------------
| ezfilename|match_count|non_matchcount|
------------ ----------- --------------
|h:/filename1| 3| 5|
------------ ----------- --------------
There are 3 key differences between your code and the code above here:
- In scala, a string has to be surrounded by
"
characters. So I've added these characters toval filename=
- You were correct in the fact that you could use a
Seq
to use thetoDF
method after importsspark.implicits._
, but each element of the string would represent one row of the dataframe. So instead of creating a dataframe with 3 columns you were creating one with 1 element. The way you can create 3 columns is by adding tuples inside of yourSeq
. So notice the difference betweenSeq(bla,bla,bla)
andSeq((bla, bla, bla))
where the latter is the correct one. You can also create multiple rows like this by doing:Seq((bla, bli, blu), (blo, ble, bly))
. - In Scala, the way you access a variable's value is by simply writing the variable's name. So writing
filename
instead of" filename "
is the correct way of doing that.
Hope this helps!