Home > Back-end >  Parse different formats of date in string format to date format pyspark when clause
Parse different formats of date in string format to date format pyspark when clause

Time:09-29

Need to convert this format to

name,code,DATE_invoice
Ram,E01,09/29/2018
Mara,E02,07/14/2017
Test,E03,01/01/18

this:

name,code,DATE_invoice
Ram,E01,2018-09-29
Mara,E02,2017-07-14
Test,E03,2018-01-01

CodePudding user response:

If the column is already a date, this should do the job:

df = df.withColumn('DATE_invoice', date_format(col("DATE_invoice"), "yyyy-MM-dd")))

CodePudding user response:

To Parse different data formats you can utilise to_date along with coalesce

You can utilise the same approach towards multiple patterns within your dataset , and example can be found here

Data preparation

input_str = """
Ram,E01,09/29/2018,
Mara,E02,07/14/2017,
Test,E03,01/01/18
""".split(",")

input_values = list(map(lambda x: x.strip() if x.strip() != 'null' else None, input_str))

cols = list(map(lambda x: x.strip() if x.strip() != 'null' else None, "name,code,DATE_invoice".split(",")))
            
n = len(input_values)
n_col = 3

input_list = [tuple(input_values[i:i n_col]) for i in range(0,n,n_col)]

sparkDF = sql.createDataFrame(input_list, cols)

sparkDF.show()

 ---- ---- ------------ 
|name|code|DATE_invoice|
 ---- ---- ------------ 
| Ram| E01|  09/29/2018|
|Mara| E02|  07/14/2017|
|Test| E03|    01/01/18|
 ---- ---- ------------ 

To Date and Coalesce

sql.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")

sparkDF.withColumn('p1',F.to_date(F.col('DATE_invoice'),"MM/dd/yyyy"))\
       .withColumn('p2',F.to_date(F.col('DATE_invoice'),"MM/dd/yy"))\
       .withColumn('DATE_invoice_parsed',F.coalesce(F.col('p1'),F.col('p2')))\
       .drop(*['p1','p2'])\
       .show(truncate=False)

 ---- ---- ------------ ------------------- 
|name|code|DATE_invoice|DATE_invoice_parsed|
 ---- ---- ------------ ------------------- 
|Ram |E01 |09/29/2018  |2018-09-29         |
|Mara|E02 |07/14/2017  |2017-07-14         |
|Test|E03 |01/01/18    |0018-01-01         |
 ---- ---- ------------ ------------------- 
  • Related