How can I rename a column based on a cell value in Pyspark?-CodePudding

Currently I have this Situation:

   signal_name  timestamp   signal_value
0  alert        1632733513  on
1  alert        1632733515  off
2  alert        1632733518  on

I want to rename the column signal_value with the signal_name. The df was filtered after the signal name alert so there is no other value for signal_name.

   signal_name  timestamp   alert
0  alert        1632733513  on
1  alert        1632733515  off
2  alert        1632733518  on

Due to the fact that the signal name is addressed, the first column is no longer needed. So I would like to drop it.

   timestamp    alert
0  1632733513   on
1  1632733515   off
2  1632733518   on

Since there are multiple df (based on other signal_name) with this problem, this approach should be generic.

CodePudding user response：

If you control the part where the dataframe is filtered on the signal_name then you can rename the column with the same value used in the filter.

Otherwise, you can select the first value of signal_name column into python variable then use it to rename the column signal_value:

data = [("alert", "1632733513", "on"), ("alert", "1632733515", "off"), ("alert", "1632733518", "on")]
df = spark.createDataFrame(data, ["signal_name", "timestamp", "signal_value"])

signal_name = df.select("signal_name").first().signal_name

df1 = df.withColumnRenamed("signal_value", signal_name).drop("signal_name")

df1.show()

#  ---------- ----- 
# | timestamp|alert|
#  ---------- ----- 
# |1632733513|   on|
# |1632733515|  off|
# |1632733518|   on|
#  ---------- -----