I have a table in RDBMS which I'm taking into a dataframe(DF1):
1 employee_id
2 employee_name
3 salary
4 designation
And I have a dataframe(DF2) with the following:
_c0 _c1 _c2 _c3
101 monali 70000 developer
102 Amy 70000 developer
103 neha 65000 tester
How do I define the schema for DF2 from DF1. I want DF2 to have the schema that is defined in the above table. expected output:
employee_id employee_name salary designation
101 monali 70000 developer
102 Amy 70000 developer
103 neha 65000 tester
I want to make it parameterized.
CodePudding user response:
You can create a function mapColumnNames
that takes two parameters, dataframe containing the columns (which I call columns
dataframe) and dataframe you want to change columns' name (which I call data
dataframe).
This function first retrieves name and id of each column in columns
dataframe as a list of tuples. Then it iterates over this list of tuples, applying method withColumnRenamed on data
dataframe on each iteration.
Then you can call this function mapColumnNames
with DF1
as columns
dataframe and DF2
as data
dataframe.
Below the complete code:
def mapColumnNames(columns: DataFrame, data: DataFrame): DataFrame = {
val columnNames = columns.collect().map(x => (x.getInt(0) - 1, x.getString(1)))
columnNames.foldLeft(data)((data, columnName) => {
data.withColumnRenamed(s"_c${columnName._1}", columnName._2)
})
}
val output = mapColumnNames(DF1, DF2)
CodePudding user response:
It wasn't clear what schema does your df1 holds, so used index 1 reference to fetch columns
val columns = df1.select($"1").collect()
Otherwise, we can get all the columns associated with the first dataframe
val columns = df1.schema.fieldNames.map(col(_))
and then use select with columns fetched for our new dataframe
val newDF = df2.select(columns :_*)