How to query result for pervious day-CodePudding

In PySpark, I have a data frame which has a column 'Date' which is in 'Date and Time' for each row in the table.

I would like to query all the rows in the data frame which happened in previous day (i.e. the day before when I execute the query).

I see this doc https://www.datasciencemadesimple.com/subtract-or-add-days-months-and-years-to-timestamp-in-pyspark/ which talks about how to create a column in data frame which subtracts a day from the values in a column.

But what I am looking for is now to subtract a day from current date and use that in filter.

CodePudding user response：

@Kashyap comment says it all but here is an example:

from pyspark.sql.functions import col
from datetime import date, timedelta

yesterday = date.today() - timedelta(1)
df.filter(col("Date") == yesterday).show()

If your Date column actually contains datetimes, you can truncate the datetimes to convert them into dates first.

from pyspark.sql.functions import col, date_trunc
from datetime import date, timedelta

yesterday = date.today() - timedelta(1)
df.filter(date_trunc("day", col("Date")) == yesterday).show()

date_trunc will only work well with timestamp columns, not strings.

Make sure you replace df with your DataFrame name.