I'm new to Scala and I'm reading some CSV data from a URL without actually saving into a CSV file. I'm storing that data into a List[Array[String]]:
The result is a DF with a single column named "value" and each Array in the list becoming a row of that column, I'm attempting to create a 15 column DF because each array has a length of 15. Any advice for this?
var stockURL: URL = null
val spark: SparkSession = SparkSession.builder.master("local").getOrCreate
import spark.implicits._
val sc = spark.sparkContext
try {
stockURL = new URL("someurlimreadingfrom.com/asdf")
val in: BufferedReader = new BufferedReader(new InputStreamReader(stockURL.openStream))
val reader: CSVReader = new CSVReader(in)
val allRows: List[Array[String]] = reader.readAll.asScala.toList
val allRowsDF = sc.parallelize(allRows).toDF()
allRowsDF.show
} catch {
case e: MalformedURLException =>
e.printStackTrace()
case e: IOException =>
e.printStackTrace()
}
I had to hide the URL and resulting DF due to sensitivity of the data, I apologize
CodePudding user response:
i have done a piece of code if i understand well your question:
it's working for a Array of length 3, you can easily extend it to 15.
val allRows: List[Array[String]] =
List(Array("a", "b", "c"), Array("a", "b", "c"))
val df1 = spark.sparkContext.parallelize(allRows).toDF()
df1
.withColumn("col0", $"value".getItem(0))
.withColumn("col1", $"value".getItem(1)).show()