I have a pySpark DataFrame Column with Julian Dates. I tried to convert the date to Calender Date.
number | julian_date |
---|---|
1 | 17196 |
2 | 17199 |
3 | 17281 |
I tried with the below code:
spdf = spdf.withColumn('date_new',functions.to_date(functions.from_unixtime("julian_date")))
However, I am getting output as:
number | julian_date | date_new |
---|---|---|
1 | 17196 | 1970-01-01 |
2 | 17199 | 1970-01-01 |
3 | 17281 | 1970-01-01 |
Please help. Thanks in advance
CodePudding user response:
Julian date is consists of 2 year numbers and 3 digits of day-of-year.
For example: 17196 is year 2017's 196th day, which is 2017-07-15.
Thus, you can use to_date
with using year (y) and day-of-year (D) format. (ref: date pattern)
df.withColumn('date_new', functions.to_date(df.julian_date, 'yyDDD'))
# If julian_date is not String type.
# df.julian_date.cast(StringType())