I basically have a requirement that needs a column that as the PeriodEndDate in. The period always ends on the 23rd of the month.
I need to take a date from a column in this case it is the last day of the month each day, and set the "day" of that date to be "23".
I have tried doing the following:
.withColumn("periodEndDate", change_day(jsonDF2.periodDate, sf.lit(23)))
cannot import name 'change_day' from 'pyspark.sql.functions'
CodePudding user response:
You can use make_date
from pyspark.sql import functions as F
df = spark.createDataFrame([('2022-05-31',)], ['periodDate'])
df = df.withColumn('periodEndDate', F.expr("make_date(year(periodDate), month(periodDate), 23)"))
df.show()
# ---------- -------------
# |periodDate|periodEndDate|
# ---------- -------------
# |2022-05-31| 2022-05-23|
# ---------- -------------
CodePudding user response:
As far as I know, there is no function change_day
however, you can make one using UDF. Pass a date
and replace day
.
Example:
from datetime import datetime
from pyspark.sql import SparkSession
from pyspark.sql.types import TimestampType
from pyspark.sql import functions as F
def change_day(date, day):
return date.replace(day=day)
change_day = F.udf(change_day, TimestampType())
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([{"date": datetime(2022, 1, 31)}])
df = df.withColumn("23day", change_day(F.col("date"), F.lit(23)))
df.show(20, False)
Result:
------------------- -------------------
|date |23day |
------------------- -------------------
|2022-01-31 00:00:00|2022-01-23 00:00:00|
------------------- -------------------