Home > Blockchain >  How to rename the columns starting with 'abcd' to starting with 'wxyz' in Spark
How to rename the columns starting with 'abcd' to starting with 'wxyz' in Spark

Time:05-13

How can I rename the columns starting with abcd to starting with wxyz.

List of columns: abcd_name, abcd_id, abcd_loc, empId, empCode

I need to change the names of columns in a dataframe that starts with abcd

Required column list: wxyz_name, wxyz_id, wxyz_loc, empId, empCode

I tried getting all the columns' lists using the below code, but not sure how to implement it.

val df_cols_abcd = df.columns.filter(_.startsWith("abcd")).map(df(_))

CodePudding user response:

You can do that with foldLeft:

val oldPrefix = "abcd"
val newPrefix = "wxyz"
val newDf = df.columns
  .filter(_.startsWith(oldPrefix))
  .foldLeft(df)((acc, oldName) =>
    acc.withColumnRenamed(oldName, newPrefix   oldName.substring(oldPrefix.length))
  )

Your first idea to filter columns with startWith is correct. The only think you miss the the part where you rename all the columns.

I recommend to do some research about foldLeft if you're not familiar with. The idea is the following:

  • I start with an initial dataframe (df in the first brackets).
  • I will apply a function to it with each of the columns I need to rename (the function is the one in the second brackets). This function takes as argument an accumulator (acc) that is an intermediate dataframe (because it will rename the columns one at a time), and another argument which is the current element of the list (here the list contains the name of the columns that need to be modified).
  • Related