Home > Software design >  Spark Explode between 2 Dates
Spark Explode between 2 Dates

Time:07-20

I am using spark 3.2, and trying to explode data between 2 months, so that I can get a column which contains all months between those 2 dates.

dataSetFromFile = dataSetFromFile.withColumn("MONTH",functions.explode(functions.sequence( functions.lit("2016-02-01").cast(DataTypes.DateType), functions.lit("2016-05-31").cast(DataTypes.DateType), functions.expr("INTERVAL 1 month"))));}

It's giving me output as

MONTH 2016-02-01 2016-03-01 2016-03-31 2016-04-30 2016-05-31

Expected Output should be:

MONTH 2016-02-01 2016-03-01 2016-04-01 2016-05-01

Can you please let us know where I did wrong?

CodePudding user response:

The problem is related to timezone.

Solution 1: Change DateType to TimestampType:

dataSetFromFile = dataSetFromFile.withColumn("MONTH",functions.explode(functions.sequence( functions.lit("2016-02-01").cast(DataTypes.TimestampType), functions.lit("2016-05-31").cast(DataTypes.TimestampType), functions.expr("INTERVAL 1 month"))));}

Solution 2: Set timezone to UTC:

spark.conf.set("spark.sql.session.timeZone", "UTC")
dataSetFromFile = dataSetFromFile.withColumn("MONTH",functions.explode(functions.sequence( functions.lit("2016-02-01").cast(DataTypes.DateType), functions.lit("2016-05-31").cast(DataTypes.DateType), functions.expr("INTERVAL 1 month"))));}
  • Related