I have a column of type string in my dataframe with value as '1 1 1 3'. I need to update the column with max of the value in pyspark.
I have tried with UDF converting string to list and then back to string. Can there be a simpler way
Sample Data: Link below enter image description here
TIA.
CodePudding user response:
from pyspark.sql import functions as F
a = [["1 1 1 3"]]
b = ["foo"]
df = spark.createDataFrame(a, b)
df.withColumn("bar", F.array_max(F.split("foo", " "))).show()
------- ---
| foo|bar|
------- ---
|1 1 1 3| 3|
------- ---