Home > Blockchain >  Week of month start from Monday in Spark 2.4
Week of month start from Monday in Spark 2.4

Time:07-21

In Spark, week of month looks like by default calculated from Sunday to Saturday

spark.sql(""" select date_format("2022-07-10","W") as week,date_format("2022-07-16","W") as week2 """).show()

Output: 3,3

Can't we have week of month calculated from Monday to Sunday?

spark.sql(""" select date_format("2022-07-10","W") as week """).show()

Desired output: 2

I am looking for a Spark SQL way to achieve this.

CodePudding user response:

There's no built-in method for it. You could use this expression:

date_format(col_date, 'W')  
CASE weekday(date_trunc('MM', col_date))
    WHEN 6 THEN (CASE weekday(col_date) WHEN 6 THEN 0 ELSE 1 END)
    ELSE (CASE weekday(col_date) WHEN 6 THEN -1 ELSE 0 END)
END

Test:

df = spark.createDataFrame([("2022-07-01",), ("2022-07-02",),
                            ("2022-07-03",), ("2022-07-10",),
                            ("2022-05-01",), ("2022-05-02",)],
                           ["col_date"])
df.createOrReplaceTempView("table")

spark.sql(
    """
    SELECT
        col_date,
        date_format(col_date, 'W') as week1,
        (
            date_format(col_date, 'W')  
            CASE weekday(date_trunc('MM', col_date))
                WHEN 6 THEN (CASE weekday(col_date) WHEN 6 THEN 0 ELSE 1 END)
                ELSE (CASE weekday(col_date) WHEN 6 THEN -1 ELSE 0 END)
            END
        ) as week2
    FROM table
    """
).show()
#  ---------- ----- ----- 
# |  col_date|week1|week2|
#  ---------- ----- ----- 
# |2022-07-01|    1|  1.0|
# |2022-07-02|    1|  1.0|
# |2022-07-03|    2|  1.0|
# |2022-07-10|    3|  2.0|
# |2022-05-01|    1|  1.0|
# |2022-05-02|    1|  2.0|
#  ---------- ----- ----- 

For other days (e.g. Tuesday), the following expression is better:

date_format(col_date, 'W')  
CASE dayofweek(date_trunc('MM', col_date)) < 3
    WHEN TRUE THEN (CASE dayofweek(col_date) < 3 WHEN TRUE THEN 0 ELSE 1 END)
    ELSE (CASE dayofweek(col_date) < 3 WHEN TRUE THEN -1 ELSE 0 END)
END

3 for Tuesday
4 for Wednesday
5 for Thursday
6 for Friday
7 for Saturday

(Numbers need to be changed in 3 places)

  • Related