Home > Mobile >  Next future date based on month interval
Next future date based on month interval

Time:06-16

I have simple Spark dataframe which has three columns (contract name, contract start date, contract length) as follows:

root
 |-- contract_name: string (nullable = true)
 |-- start_date: date (nullable = true)
 |-- length: long (nullable = true)

As contracts continue to renew, I would like to add a column which shows the next future date the contract would renew. In Python, I calculate this recursively as follows:

def find_next_date(date, periods):
    new_date = date   relativedelta(months=periods)
    if new_date <= datetime.today():
        return find_next_date(new_date, periods)
    else:
        return new_date  

However, is there a way to do this directly in PySpark?

CodePudding user response:

You can create a sequence of dates and pick the last one.

from pyspark.sql import functions as F
df = spark.createDataFrame(
    [('cont1', '2022-01-14', 5),
     ('cont2', '2022-01-15', 5),
     ('cont3', '2022-01-16', 5),
     ('cont4', '2022-02-14', 1),
     ('cont5', '2022-02-15', 1),
     ('cont6', '2022-02-16', 1)],
    ['contract_name', 'start_date', 'length'])
df = df.withColumn('start_date', F.col('start_date').cast('date'))

dates = F.expr("sequence(start_date, add_months(current_date(), length), make_interval(0, length))")
df = df.withColumn('next_date', F.element_at(dates, -1))

df.show()
#  ------------- ---------- ------ ---------- 
# |contract_name|start_date|length| next_date|
#  ------------- ---------- ------ ---------- 
# |        cont1|2022-01-14|     5|2022-11-14|
# |        cont2|2022-01-15|     5|2022-11-15|
# |        cont3|2022-01-16|     5|2022-06-16|
# |        cont4|2022-02-14|     1|2022-07-14|
# |        cont5|2022-02-15|     1|2022-07-15|
# |        cont6|2022-02-16|     1|2022-06-16|
#  ------------- ---------- ------ ---------- 
  • Related