Have my dataframe as shown below.Here I have to remove the last occurrence of the string "_value" from all the column name of my dataframe.
import spark.implicits._
import org.apache.spark.sql.functions._
val simpledata = Seq(("file1","name1","101"),
("file1","name1","101"),
("file1","name1","101"),
("file1","name1","101"),
("file1","name1","101"))
val df = simpledata.toDF("filename_value","name_value_value","serialNo_value")
df.show()
Output menu
enter image description here
If I use replaceAll:
val renamedColumnsDf = df.columns.map(c => df(c).as(c.replaceAll('_value',"")))
it removes all the _values but i need only to remove the string based on last occurance.
Need help here to remove the string based on occurrence in column name.
My output should be:
-------------- ---------------- --------------
|filename |name_value |serialNo |
-------------- ---------------- --------------
| file1| name1| 101|
| file1| name1| 101|
| file1| name1| 101|
| file1| name1| 101|
| file1| name1| 101|
-------------- ---------------- --------------
CodePudding user response:
If you wish to remove the _value
substring only if it is the suffix of the column name, you can do the following:
val simpleDf: DataFrame = simpledata.toDF("filename_value", "name_value_value", "serialNo_value")
val suffix: String = "_value"
val renamedDf: DataFrame = simpleDf.columns.foldLeft(simpleDf) { (df, c) =>
if (c.endsWith(suffix)) df.withColumnRenamed(c, c.substring(0, c.length - suffix.length)) else df}
renamedDf.show()
The output will be:
-------- ---------- --------
|filename|name_value|serialNo|
-------- ---------- --------
| file1| name1| 101|
| file1| name1| 101|
| file1| name1| 101|
| file1| name1| 101|
| file1| name1| 101|
-------- ---------- --------
CodePudding user response:
Why bother complicated coding? You can use pattern matching on the column name inside your map transformation:
val newName = columnName match {
case s"${something}_value" => something
case other => other
}