Home > other >  PySpark: Turn parameter into last working day of month
PySpark: Turn parameter into last working day of month

Time:09-17

I have a parameter variable which I want to turn into a date variable indicating the last working day of the month. From what I've read I can turn d=date.today() into the last working day but not just 202109. My skript looks like this:

from pandas.tseries.offsets import BMonthEnd
from datetime import date

date = 202109
d = date_format(date, 'yyyyMM')
offset = BMonthEnd()
lastworkingday = offset.rollforward(d)

I'm pretty sure it goes wrong when turning date into d but I do not know how to fix it. Additionally, can you tell me how to keep only the date and drop the time in the result? Thank you.

CodePudding user response:

IIUC, you want a column of DateType in your Spark dataframe that is equal to the last working day of the month specified in the input variable date. Here is a solution

from datetime import datetime
from pandas.tseries.offsets import BMonthEnd
import pyspark.sql.functions as F

# input variable
date = 202109

# get last day of current month (in string format)
d = datetime.strptime(str(date), '%Y%m')
offset = BMonthEnd()
last_working_day = offset.rollforward(d)
my_date = last_working_day.strftime('%Y-%m-%d')
print(my_date)
# 2021-09-30

# add column to spark dataframe
df = df.withColumn('my_date', F.to_date(F.lit(my_date)))
  • Related