Home > Back-end >  Converting value from Pyspark Row datetime.date to yyyy-mm-dd
Converting value from Pyspark Row datetime.date to yyyy-mm-dd

Time:12-02

I am trying to fetch data from a table that returns a list of Row datetime.date objects. I would like to have them as a list of Varchar/String values.

from pyspark.sql import SparkSession
query = "select device_date from device where device is not null"
res=spark.sql(query).collect()

if len(res)!=0:
   return res[:20]

The returned value seems to be of format

[Row(device_date =datetime.date(2019, 9, 25)), Row(device_date =datetime.date(2019, 9, 17)), Row(device_date =datetime.date(2020, 1, 8))]

I would like to have the following output returned instead:

['2019-09-25','2019-09-17','2020-01-08']

Please advise.

CodePudding user response:

I suggest you use the date_format function beforehand. Here is the documentation, but basically:

>>> from pyspark.sql.functions import date_format
>>> df.select(date_format('device_date', 'YYYY-mm-dd').alias('date')).collect()
[Row(date='2015-04-08')]

Also please be careful, it seems like your column name: "device_date " has a space at the end. That could be making your life harder.

CodePudding user response:

Are you sure you want to collect your data and then have to process them using python ?

Depending on the answer :

YES (python solution)

out = df.collect()

list(map(lambda x: datetime.datetime.strftime(x.device_date, "%Y-%m-%d"), out))

['2019-09-25', '2019-09-17', '2020-01-08']

NO (Spark solution)

df.select(F.date_format("device_date", "yyyy-MM-dd").alias("device_date")).collect()
                                                                       
[Row(device_date='2019-09-25'),
 Row(device_date='2019-09-17'),
 Row(device_date='2020-01-08')]
  • Related