I am trying some basic functions in pyspark like min max ect. while using pandas df.min() I got all the separate columns and their minimum values like the image I have attached.
I need the same output using pyspark code. but I don't know how to do that
Please help me on this
CodePudding user response:
You can try with below code,
# sample data
data = [(1,10,"a"), (2,10,"c"), (0, 100, "t")]
cols = ["col1", "col2", "col3"]
df = spark.createDataFrame(data, cols)
df.show()
---- ---- ----
|col1|col2|col3|
---- ---- ----
| 1| 10| a|
| 2| 10| c|
| 0| 100| t|
---- ---- ----
df.selectExpr([f"min({x}) as {x}" for x in cols]).show()
---- ---- ----
|col1|col2|col3|
---- ---- ----
| 0| 10| a|
---- ---- ----