I have this code which generates a list by means of a for
, I look for the output of the println
to pass it to a dataframe to be able to manipulate the resulting damage, in Scala.
for (l <- ListArchive){
val LastModified: (String, String) =(l,getLastModifiedLCO(l))
println(LastModified)
}
Output println (LCO_2014-12-09_3.XML.gz,Tue Dec 09 07:48:30 UTC 2014)
(LCO_2014-12-09_1.XML.gz,Tue Dec 09 07:48:30 UTC 2014)
CodePudding user response:
Rewrite it to generate a list/sequence, and then turn into a DataFrame. Something like this:
import spark.implicits._
val df = ListArchive.map(l => (l, getLastModifiedLCO(l)))
.toDF("col1Name", "col2Name")
If the list is very big, then you can try to turn it into an RDD via parallelize
, and then apply similar map
to it, but it will run in the distributed manner.