In Snowflake/SQL we can do:
SELECT * FROM myTbl
WHERE date_col
BETWEEN
CONVERT_TIMEZONE('UTC','America/Los_Angeles', some_date_string_col)::DATE - INTERVAL '7 DAY'
AND
CONVERT_TIMEZONE('UTC','America/Los_Angeles', some_date_string_col)::DATE - INTERVAL '1 DAY'
Is there a pyspark translation for this for dataframes?
I imagine if something like this
myDf.filter(
col(date_col) >= to_utc_timestamp(...)
)
But how can we do BETWEEN
and also the interval
?
CodePudding user response:
You can use INTERVAL
within SQL expression like this:
df1 = df.filter(
F.col("date_col").between(
F.expr("current_timestamp - interval 7 days"),
F.expr("current_timestamp - interval 1 days"),
)
)
However if you only filter using days, you can simply use date_add
(or date_sub
) function:
from pyspark.sql import functions as F
df1 = df.filter(
F.col("date_col").between(
F.date_add(F.current_date(), -7),
F.date_add(F.current_date(), -1)
)
)