Home > database >  Sql query to pyspark dataframe function
Sql query to pyspark dataframe function

Time:11-24

I want to replicate the below code using pyspark DataFrame functions instead of SQL query.

spark.sql("select date from walmart_stock order by high desc limit 1").show()

Link of dataset

CodePudding user response:

Here is the code if you start from the linked CSV file. You should recognize the SQL functions. Note that we use the inferSchema option in order to directly parse the numbers into doubles and obtain the correct ordering (it would not work as expected with the default string type). Another way would be to cast the column after reading the CSV.

spark.read
    .option("header", "true")
    .option("inferSchema", "true")
    .csv("walmart_stock.csv")
    .orderBy(f.col("High"), desc=True)
    .limit(1)
    .select("Date")
    .show()

which yields

 ---------- 
|      Date|
 ---------- 
|2015-11-13|
 ---------- 
  • Related