Home > Blockchain >  max of space separated values in a column with string type in pyspark
max of space separated values in a column with string type in pyspark

Time:10-08

I have a column of type string in my dataframe with value as '1 1 1 3'. I need to update the column with max of the value in pyspark.

I have tried with UDF converting string to list and then back to string. Can there be a simpler way

Sample Data: Link below enter image description here

TIA.

CodePudding user response:

split and then array_max:

from pyspark.sql import functions as F

a = [["1 1 1 3"]]
b = ["foo"]
df = spark.createDataFrame(a, b)

df.withColumn("bar", F.array_max(F.split("foo", " "))).show()

 ------- --- 
|    foo|bar|
 ------- --- 
|1 1 1 3|  3|
 ------- --- 
  • Related