I was trying to change the datatype of a column (Disponibility
) from string type to date, but every time it shows this column converted as null values (for example: 23/01/2022 becomes null ....)
This is my code:
dfwdate = dfworkers2.withColumn("Disponibility", to_date("Disponibility")) \
.show(truncate=False)
CodePudding user response:
You additionally need to supply the date_format
with to_date
, various format references can be found Spark date pattern documentation page
Date Conversion Examples
s = StringIO("""
date_str
2022-03-01
2022-05-20
2022-06-21
2022-10-22
""")
df = pd.read_csv(s,delimiter=',')
sparkDF = sql.createDataFrame(df)\
.withColumn('date_parsed',F.to_date(F.col('date_str'), 'yyyy-MM-dd'))\
.drop('date_str')
sparkDF.show()
-----------
|date_parsed|
-----------
| 2022-03-01|
| 2022-05-20|
| 2022-06-21|
| 2022-10-22|
-----------
sparkDF.printSchema()
root
|-- date_parsed: date (nullable = true)
CodePudding user response:
to_date('Disponibility', 'dd/MM/yyyy')
You have correctly chosen the function to_date
. It's just that in your case it requires another parameter - the actual format of your date string 'dd-MM-yyyy'
. By default, to_date
expects to get the format 'yyyy-MM-dd'
. Since your column is not of this format, you get null returned.
Full example:
from pyspark.sql import functions as F
df = spark.createDataFrame([('23/01/2022',)], ['Disponibility'])
df.show()
# -------------
# |Disponibility|
# -------------
# | 23/01/2022|
# -------------
print(df.dtypes)
# [('Disponibility', 'string')]
df = df.withColumn('Disponibility', F.to_date('Disponibility', 'dd/MM/yyyy'))
df.show()
# -------------
# |Disponibility|
# -------------
# | 2022-01-23|
# -------------
print(df.dtypes)
# [('Disponibility', 'date')]