I have a column named version with integer values 1,2,....upto 8. I want to replace all the integer values with the maximum number present in the same column version, In this case its 8, So I want to replace 1,2,3,4,5,6,7 with 8. I tried couple of methods but couldn't get the solution.
testDF = spark.createDataFrame([(1,"a"), (2,"b"), (3,"c"), (4,"d"), (5,"e"), (6,"f"), (7,"g"), (8,"h")], ["version", "name"])
testDF.show()
------- ----
|version|name|
------- ----
| 1| a|
| 2| b|
| 3| c|
| 4| d|
| 5| e|
| 6| f|
| 7| g|
| 8| h|
------- ----
expected
------- ----
|version|name|
------- ----
| 8| a|
| 8| b|
| 8| c|
| 8| d|
| 8| e|
| 8| f|
| 8| g|
| 8| h|
------- ----
CodePudding user response:
try this,
testDF=testDF.withColumn("version", lit(testDF.agg({"version": "max"}).collect()[0][0]))
Output:
------- ----
|version|name|
------- ----
| 8| a|
| 8| b|
| 8| c|
| 8| d|
| 8| e|
| 8| f|
| 8| g|
| 8| h|
------- ----
Increment value like below:
testDF.withColumn("version", lit(testDF.agg({"version": "max"}).collect()[0][0] 1))
Output:
------- ----
|version|name|
------- ----
| 9| a|
| 9| b|
| 9| c|
| 9| d|
| 9| e|
| 9| f|
| 9| g|
| 9| h|
------- ----