Home > database >  Compute the difference in days between the max and the min of a date type column
Compute the difference in days between the max and the min of a date type column

Time:04-27

My table loaded in PySpark has a column "Date" with the following type of data:

Date Open High Low Close Volume Adj Close
1/3/2012 59.97 61.06 59.87 60.33 12668800 52.61923
1/4/2012 60.21 60.35 59.47 59.71 9593300 52.07848
1/5/2012 59.35 59.62 58.37 59.42 12768200 51.82554
1/6/2012 59.42 59.45 58.87 59 8069400 51.45922

How do I calculate the difference, in days, between the max and the min of the column (so in the example above, I need the difference in day between 1/6/2012 and 1/3/2012

CodePudding user response:

Test data:

from pyspark.sql import functions as F

df = spark.createDataFrame([('2012-01-03',),('2013-02-03',),('2011-11-29',)], ['Date']).select(F.col('Date').cast('date'))
df.show()
#  ---------- 
# |      Date|
#  ---------- 
# |2012-01-03|
# |2013-02-03|
# |2011-11-29|
#  ---------- 

This will create a new dataframe containing the difference in days:

df_diff = df.groupBy().agg(F.datediff(F.max('Date'), F.min('Date')).alias('diff'))

df_diff.show()
#  ---- 
# |diff|
#  ---- 
# | 432|
#  ---- 

# If you need the difference in a variable:
v = df_diff.head().diff
print(v)
# 432

And this will add a new column to your existing df:

df = df.withColumn('diff', F.expr('datediff(max(Date) over(), min(Date) over())'))

df.show()
#  ---------- ---- 
# |      Date|diff|
#  ---------- ---- 
# |2012-01-03| 432|
# |2013-02-03| 432|
# |2011-11-29| 432|
#  ---------- ---- 

CodePudding user response:

Supposing your dataframe df has only the column Date in date format, you can do the following:

from pyspark.sql import functions as F

(df.withColumn('Max_Date', F.max(F.col('Date')))
   .withColumn('Min_Date', F.min(F.col('Date')))
   .withColumn('Diff_days', F.datediff(F.col('Max_Date'), F.col('Min_Date')))
   .drop('Date').dropDuplicates())

In this link you can find more examples about sql functions for pyspark: https://sparkbyexamples.com/spark/spark-sql-functions/

  • Related