I want to replicate the below code using pyspark DataFrame functions instead of SQL query.
spark.sql("select date from walmart_stock order by high desc limit 1").show()
CodePudding user response:
Here is the code if you start from the linked CSV file. You should recognize the SQL functions. Note that we use the inferSchema
option in order to directly parse the numbers into doubles and obtain the correct ordering (it would not work as expected with the default string type). Another way would be to cast the column after reading the CSV.
spark.read
.option("header", "true")
.option("inferSchema", "true")
.csv("walmart_stock.csv")
.orderBy(f.col("High"), desc=True)
.limit(1)
.select("Date")
.show()
which yields
----------
| Date|
----------
|2015-11-13|
----------