I have a date variable that I need to pass to various functions.
For e.g, if I have the date in a variable as 12/09/2021, it should return me 01/01/2021
How do I get 1st day of the year in PySpark
CodePudding user response:
You can use the trunc-function which truncates parts of a date.
df = spark.createDataFrame([()], [])
(
df
.withColumn('current_date', f.current_date())
.withColumn("year_start", f.trunc("current_date", "year"))
.show()
)
# Output
------------ ----------
|current_date|year_start|
------------ ----------
| 2022-02-23|2022-01-01|
------------ ----------
CodePudding user response:
x = '12/09/2021'
'01/01/' x[-4:]
output: '01/01/2021'
CodePudding user response:
You can achieve this with date_trunc with to_date as the later returns a Timestamp
rather than a Date
Data Preparation
df = pd.DataFrame({
'Date':['2021-01-23','2002-02-09','2009-09-19'],
})
sparkDF = sql.createDataFrame(df)
sparkDF.show()
----------
| Date|
----------
|2021-01-23|
|2002-02-09|
|2009-09-19|
----------
Date Trunc & To Date
sparkDF = sparkDF.withColumn('first_day_year_dt',F.to_date(F.date_trunc('year',F.col('Date')),'yyyy-MM-dd'))\
.withColumn('first_day_year_timestamp',F.date_trunc('year',F.col('Date')))
sparkDF.show()
---------- ----------------- ------------------------
| Date|first_day_year_dt|first_day_year_timestamp|
---------- ----------------- ------------------------
|2021-01-23| 2021-01-01| 2021-01-01 00:00:00|
|2002-02-09| 2002-01-01| 2002-01-01 00:00:00|
|2009-09-19| 2009-01-01| 2009-01-01 00:00:00|
---------- ----------------- ------------------------