In Spark, week of month looks like by default calculated from Sunday to Saturday
spark.sql(""" select date_format("2022-07-10","W") as week,date_format("2022-07-16","W") as week2 """).show()
Output: 3,3
Can't we have week of month calculated from Monday to Sunday?
spark.sql(""" select date_format("2022-07-10","W") as week """).show()
Desired output: 2
I am looking for a Spark SQL way to achieve this.
CodePudding user response:
There's no built-in method for it. You could use this expression:
date_format(col_date, 'W')
CASE weekday(date_trunc('MM', col_date))
WHEN 6 THEN (CASE weekday(col_date) WHEN 6 THEN 0 ELSE 1 END)
ELSE (CASE weekday(col_date) WHEN 6 THEN -1 ELSE 0 END)
END
Test:
df = spark.createDataFrame([("2022-07-01",), ("2022-07-02",),
("2022-07-03",), ("2022-07-10",),
("2022-05-01",), ("2022-05-02",)],
["col_date"])
df.createOrReplaceTempView("table")
spark.sql(
"""
SELECT
col_date,
date_format(col_date, 'W') as week1,
(
date_format(col_date, 'W')
CASE weekday(date_trunc('MM', col_date))
WHEN 6 THEN (CASE weekday(col_date) WHEN 6 THEN 0 ELSE 1 END)
ELSE (CASE weekday(col_date) WHEN 6 THEN -1 ELSE 0 END)
END
) as week2
FROM table
"""
).show()
# ---------- ----- -----
# | col_date|week1|week2|
# ---------- ----- -----
# |2022-07-01| 1| 1.0|
# |2022-07-02| 1| 1.0|
# |2022-07-03| 2| 1.0|
# |2022-07-10| 3| 2.0|
# |2022-05-01| 1| 1.0|
# |2022-05-02| 1| 2.0|
# ---------- ----- -----
For other days (e.g. Tuesday), the following expression is better:
date_format(col_date, 'W')
CASE dayofweek(date_trunc('MM', col_date)) < 3
WHEN TRUE THEN (CASE dayofweek(col_date) < 3 WHEN TRUE THEN 0 ELSE 1 END)
ELSE (CASE dayofweek(col_date) < 3 WHEN TRUE THEN -1 ELSE 0 END)
END
3
for Tuesday
4
for Wednesday
5
for Thursday
6
for Friday
7
for Saturday
(Numbers need to be changed in 3 places)