Home > Mobile >  Converting String type Date values to Date Format in pyspark
Converting String type Date values to Date Format in pyspark

Time:10-10

I am trying to convert string type value(mm/dd/yyyy) in Date column into date format(dd-mm-yyyy). there are total of 5 date related column in my file and i want to change them into Proper date format(dd-mm-yy) from (mm/dd//yy). how can i achieve this??

df = pd.DataFrame([[10/23/2019, 09/13/2021], [06/16/2020, 03/16/2021], [09/06/2022, 12/23/2019], columns=['A', 'B'])

Output will be like 23-10-2019, 13-09-2021

My code:

df = df.withColumn('date_col',to_date('Date_col', 'dd-MM-yy'))

the code is running fine but returning undefined in output for date column. As i have 5 date columns is it possible to do it through using For Loop.

CodePudding user response:

to_date changes a column to date type, so you need to specify the format your string is in but not the target format. You should use date_format function which finally change the date column to another string of another format.

Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.

df.select(date_format(to_date(df.t, 'MM/dd/yyyy'), "dd-MM-yyyy").alias('date')).show()

If you only have 5 columns to change to the date type and this number will not change dynamically, I suggest you just do:

df.withColumn("col1", date_format(to_date("dt1", MM/dd/yyyy'), "dd-MM-yyyy"))
.withColumn("col2", date_format(to_date("dt2", MM/dd/yyyy'), "dd-MM-yyyy"))
.withColumn("col3", date_format(to_date("dt3", MM/dd/yyyy'), "dd-MM-yyyy"))
.withColumn("col4", date_format(to_date("dt4", MM/dd/yyyy'), "dd-MM-yyyy"))
.withColumn("col5", date_format(to_date("dt5", MM/dd/yyyy'), "dd-MM-yyyy"))

CodePudding user response:

Here are my 2 cents

  1. Created a data frame with 2 columns(You can create as many columns as you need)

     df = spark.createDataFrame([('05/22/2022','10/21/2022')],schema=['Date_col1','Date_col2'])
    
  2. Use the to_date() followed by date_format()

     from pyspark.sql.functions import *
    
     df1 = df.withColumn('date_col1',date_format(to_date(col("Date_col1"),"MM/dd/yyyy"),'dd-MM-yyyy'))\
             .withColumn('date_col2',date_format(to_date(col("Date_col2"),"MM/dd/yyyy"),'dd-MM-yyyy'))\
    
  3. Print the data frame

     df1.show()
    

Please check the below image:

enter image description here

  • Related