I have a dataset like the below:
df = spark.sql("select '210927' as t_date")
Now, I want to convert it to '27-09-2021'
. Below is my code along with the error:
>>> df = df.withColumn("modifiedDate",datetime.strptime("t_date", "%y%m%d").strftime('%d-%m-%Y'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/_strptime.py", line 568, in _strptime_datetime
tt, fraction, gmtoff_fraction = _strptime(data_string, format)
File "/usr/lib/python3.8/_strptime.py", line 349, in _strptime
raise ValueError("time data %r does not match format %r" %
ValueError: time data 't_date' does not match format '%y%m%d'
I tried several examples from SO and this link but didn't work. I am using Python
with Pyspark
CodePudding user response:
How about
from datetime import datetime
x = '210927'
year = 2000 int(x[:2])
month = int(x[2:4])
day = int(x[4:])
dt = datetime(year=year, month=month, day=day)
print(dt)
output
2021-09-27 00:00:00
CodePudding user response:
Below code worked for me:
df = spark.createDataFrame(
[
(1, "210927"),
(2, "210928"),
(3, "210929"),
(4, "210930"),
(5, "211001"),
],
StructType(
[
StructField("id", IntegerType(), False),
StructField("col", StringType(), False),
]
),
)
pDF= df.toPandas()
valuesList = pDF['col'].to_list()
modifiedList = list()
for i in valuesList:
... modifiedList.append(datetime.strptime(i, "%y%m%d").strftime('%d-%m-%Y'))
pDF['t_date1']=modifiedList
df = spark.createDataFrame(pDF)
Might help someone someday!