Home > Software design >  Convert date to ISO week date in Spark
Convert date to ISO week date in Spark

Time:12-29

Having dates in one column, how to create a column containing ISO week date?

ISO week date is composed of year, week number and weekday.

  • year is not the same as the year obtained using year function.
  • week number is the easy part - it can be obtained using weekofyear.
  • weekday should return 1 for Monday and 7 for Sunday, while Spark's dayofweek cannot do it.

Example dataframe:

from pyspark.sql import SparkSession, functions as F
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([
    ('1977-12-31',),
    ('1978-01-01',),
    ('1978-01-02',),
    ('1978-12-31',),
    ('1979-01-01',),
    ('1979-12-30',),
    ('1979-12-31',),
    ('1980-01-01',)],
    ['my_date']
).select(F.col('my_date').cast('date'))

df.show()
# ---------- 
#|   my_date|
# ---------- 
#|1977-12-31|
#|1978-01-01|
#|1978-01-02|
#|1978-12-31|
#|1979-01-01|
#|1979-12-30|
#|1979-12-31|
#|1980-01-01|
# ---------- 

Desired result:

 ---------- ------------- 
|   my_date|iso_week_date|
 ---------- ------------- 
|1977-12-31|   1977-W52-6|
|1978-01-01|   1977-W52-7|
|1978-01-02|   1978-W01-1|
|1978-12-31|   1978-W52-7|
|1979-01-01|   1979-W01-1|
|1979-12-30|   1979-W52-7|
|1979-12-31|   1980-W01-1|
|1980-01-01|   1980-W01-2|
 ---------- ------------- 

CodePudding user response:

Your solution is already nice, maybe you could shorten it by simplifying the calculations:

  • iso_weekday = (dayofweek(my_date) 5)%7 1
  • iso_year= year(date_add(my_date, 4 - iso_weekday))

Which gives you:

import pyspark.sql.functions as F

df.withColumn(
    'iso_week_date',
    F.concat_ws(
        "-",
        F.year(F.expr("date_add(my_date, 4 - (dayofweek(my_date)   5) % 7   1)")),
        F.lpad(F.weekofyear('my_date'), 3, "W0"),
        (F.dayofweek('my_date')   5) % 7   1
    )
).show()

# ---------- ------------- 
#|   my_date|iso_week_date|
# ---------- ------------- 
#|1977-12-31|   1977-W52-6|
#|1978-01-01|   1977-W52-7|
#|1978-01-02|   1978-W01-1|
#|1978-12-31|   1978-W52-7|
#|1979-01-01|   1979-W01-1|
#|1979-12-30|   1979-W52-7|
#|1979-12-31|   1980-W01-1|
#|1980-01-01|   1980-W01-2|
# ---------- ------------- 

CodePudding user response:

First, one could create rules for columns for year and weekday. Then, concatenate them using concat_ws and lpad.

week_from_prev_year = (F.month('my_date') == 1) & (F.weekofyear('my_date') > 9)
week_from_next_year = (F.month('my_date') == 12) & (F.weekofyear('my_date') == 1)
iso_year = F.when(week_from_prev_year, F.year('my_date') - 1) \
            .when(week_from_next_year, F.year('my_date')   1) \
            .otherwise(F.year('my_date'))
iso_weekday = F.when(F.dayofweek('my_date') != 1, F.dayofweek('my_date')-1).otherwise(7)
iso_week_date = F.concat_ws('-', iso_year, F.lpad(F.weekofyear('my_date'), 3, 'W0'), iso_weekday)
df2 = df.withColumn('iso_week_date', iso_week_date)

df2.show()
# ---------- ------------- 
#|   my_date|iso_week_date|
# ---------- ------------- 
#|1977-12-31|   1977-W52-6|
#|1978-01-01|   1977-W52-7|
#|1978-01-02|   1978-W01-1|
#|1978-12-31|   1978-W52-7|
#|1979-01-01|   1979-W01-1|
#|1979-12-30|   1979-W52-7|
#|1979-12-31|   1980-W01-1|
#|1980-01-01|   1980-W01-2|
# ---------- ------------- 
  • Related