Home > Blockchain >  How to get 1st day of the year in pyspark
How to get 1st day of the year in pyspark

Time:02-24

I have a date variable that I need to pass to various functions.

For e.g, if I have the date in a variable as 12/09/2021, it should return me 01/01/2021

How do I get 1st day of the year in PySpark

CodePudding user response:

You can use the trunc-function which truncates parts of a date.

df = spark.createDataFrame([()], [])
(
    df
    .withColumn('current_date', f.current_date())
    .withColumn("year_start", f.trunc("current_date", "year"))
    .show()
)

# Output
 ------------ ---------- 
|current_date|year_start|
 ------------ ---------- 
|  2022-02-23|2022-01-01|
 ------------ ---------- 

CodePudding user response:

x = '12/09/2021'

'01/01/'   x[-4:]
output: '01/01/2021'

CodePudding user response:

You can achieve this with date_trunc with to_date as the later returns a Timestamp rather than a Date

Data Preparation

df = pd.DataFrame({
        'Date':['2021-01-23','2002-02-09','2009-09-19'],
})

sparkDF = sql.createDataFrame(df)

sparkDF.show()

 ---------- 
|      Date|
 ---------- 
|2021-01-23|
|2002-02-09|
|2009-09-19|
 ---------- 

Date Trunc & To Date

sparkDF = sparkDF.withColumn('first_day_year_dt',F.to_date(F.date_trunc('year',F.col('Date')),'yyyy-MM-dd'))\
                 .withColumn('first_day_year_timestamp',F.date_trunc('year',F.col('Date')))

sparkDF.show()

 ---------- ----------------- ------------------------ 
|      Date|first_day_year_dt|first_day_year_timestamp|
 ---------- ----------------- ------------------------ 
|2021-01-23|       2021-01-01|     2021-01-01 00:00:00|
|2002-02-09|       2002-01-01|     2002-01-01 00:00:00|
|2009-09-19|       2009-01-01|     2009-01-01 00:00:00|
 ---------- ----------------- ------------------------ 
  • Related