Home > Software engineering >  Code transformation from Scala to Pyspark
Code transformation from Scala to Pyspark

Time:10-25

I am very new to Scala and Pyspark, I have to transform this piece of code which is written in Scala to Pyspark. can someone help me to understand the syntax in Scala in order to be able to transform it?

val df= spark.read.parquet(s"$basePath/dod_m/")
.select(df2.map(x => col(x._1).as(x._2)).toList :_*)

CodePudding user response:

Most likely, df2 is a simple scala collection here.

If it were a dataframe, df2.map(x => col(x._1).as(x._2)) would yield error: value _1 is not a member of org.apache.spark.sql.Row. Indeed, the map function on a dataframe allows you to work on a Row object, not a tuple.

If it were a dataset of (String, String) for instance, df2.map(x => col(x._1).as(x._2)) would yield: error: Unable to find encoder for type org.apache.spark.sql.Column.. If you define such an encoder, you would obtain error: value toList is not a member of org.apache.spark.sql.Dataset[org.apache.spark.sql.Column] which is rather clear.

RDDs do not possess the toList method either.

So let's consider df2 to be a scala collection of (String, String). df2.map(x => col(x._1).as(x._2)).toList is about renaming columns. Former names are the first element of the tuple, new names the second element.

An example in scala:

val df2 = Seq(("a", "b"), ("c", "d"))
val df = Seq((1, 2), (4, 5)).toDF("a", "c")

// running this in a shell, we see that it is about renaming columns
df2.map(x => col(x._1).as(x._2)).toList
//res2: List[org.apache.spark.sql.Column] = List(a AS b, c AS d)

Let's try:

df.show
 --- --- 
|  a|  c|
 --- --- 
|  1|  2|
|  4|  5|
 --- --- 


df.select(df2.map(x => col(x._1).as(x._2)).toList :_*).show
 --- --- 
|  b|  d|
 --- --- 
|  1|  2|
|  4|  5|
 --- --- 

in python:

df2 = [("a", "b"), ("c", "d")]
df = spark.createDataFrame([(1, 2), (4, 5)], ['a', 'c'])
import pyspark.sql.functions as f

df.select([f.col(x[0]).alias(x[1]) for x in df2]).show()
 --- --- 
|  b|  d|
 --- --- 
|  1|  2|
|  4|  5|
 --- --- 
  • Related