Home > Software engineering >  how to convert a println output to a dataframe in Scala
how to convert a println output to a dataframe in Scala

Time:12-10

I have this code which generates a list by means of a for, I look for the output of the println to pass it to a dataframe to be able to manipulate the resulting damage, in Scala.

for (l <- ListArchive){  
     val LastModified: (String, String) =(l,getLastModifiedLCO(l))
     println(LastModified)
  }

Output println (LCO_2014-12-09_3.XML.gz,Tue Dec 09 07:48:30 UTC 2014) (LCO_2014-12-09_1.XML.gz,Tue Dec 09 07:48:30 UTC 2014)

CodePudding user response:

Rewrite it to generate a list/sequence, and then turn into a DataFrame. Something like this:

import spark.implicits._
val df = ListArchive.map(l => (l, getLastModifiedLCO(l)))
  .toDF("col1Name", "col2Name")

If the list is very big, then you can try to turn it into an RDD via parallelize, and then apply similar map to it, but it will run in the distributed manner.

  • Related