Home > Software design >  how to get the output in pyspark same as got in pandas
how to get the output in pyspark same as got in pandas

Time:12-13

I am trying some basic functions in pyspark like min max ect. while using pandas df.min() I got all the separate columns and their minimum values like the image I have attached. Output of  df.min()

I need the same output using pyspark code. but I don't know how to do that

Please help me on this

CodePudding user response:

You can try with below code,

# sample data
data = [(1,10,"a"), (2,10,"c"), (0, 100, "t")]
cols = ["col1", "col2", "col3"]
df = spark.createDataFrame(data, cols)
df.show()

 ---- ---- ---- 
|col1|col2|col3|
 ---- ---- ---- 
|   1|  10|   a|
|   2|  10|   c|
|   0| 100|   t|
 ---- ---- ---- 

df.selectExpr([f"min({x}) as {x}"  for x in cols]).show()

 ---- ---- ---- 
|col1|col2|col3|
 ---- ---- ---- 
|   0|  10|   a|
 ---- ---- ---- 
  • Related