Home > Software design >  Pyspark : Convert Julian Date to Calendar date
Pyspark : Convert Julian Date to Calendar date

Time:05-18

I have a pySpark DataFrame Column with Julian Dates. I tried to convert the date to Calender Date.

number julian_date
1 17196
2 17199
3 17281

I tried with the below code:

spdf = spdf.withColumn('date_new',functions.to_date(functions.from_unixtime("julian_date")))

However, I am getting output as:

number julian_date date_new
1 17196 1970-01-01
2 17199 1970-01-01
3 17281 1970-01-01

Please help. Thanks in advance

CodePudding user response:

Julian date is consists of 2 year numbers and 3 digits of day-of-year.

For example: 17196 is year 2017's 196th day, which is 2017-07-15.

Thus, you can use to_date with using year (y) and day-of-year (D) format. (ref: date pattern)

df.withColumn('date_new', functions.to_date(df.julian_date, 'yyDDD'))

# If julian_date is not String type.
# df.julian_date.cast(StringType())
  • Related