Home > OS >  Passing columns and not variables with string format for converting yymmdd to dd-mm-YYYY in python
Passing columns and not variables with string format for converting yymmdd to dd-mm-YYYY in python

Time:10-01

I have a dataset like the below:

df = spark.sql("select '210927' as t_date")

Now, I want to convert it to '27-09-2021'. Below is my code along with the error:

   >>> df = df.withColumn("modifiedDate",datetime.strptime("t_date", "%y%m%d").strftime('%d-%m-%Y'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/lib/python3.8/_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data 't_date' does not match format '%y%m%d'

I tried several examples from SO and this link but didn't work. I am using Python with Pyspark

CodePudding user response:

How about

from datetime import datetime

x = '210927'
year = 2000   int(x[:2])
month = int(x[2:4])
day = int(x[4:])
dt = datetime(year=year, month=month, day=day)
print(dt)

output

2021-09-27 00:00:00

CodePudding user response:

Below code worked for me:

df = spark.createDataFrame(
    [
        (1, "210927"),
        (2, "210928"),
        (3, "210929"),
        (4, "210930"),
        (5, "211001"),
    ],
    StructType(
        [
            StructField("id", IntegerType(), False),
            StructField("col", StringType(), False),
        ]
    ),
)
 
pDF= df.toPandas()
valuesList = pDF['col'].to_list()
modifiedList = list()
 
for i in valuesList:
...  modifiedList.append(datetime.strptime(i, "%y%m%d").strftime('%d-%m-%Y'))
 
pDF['t_date1']=modifiedList
 
df = spark.createDataFrame(pDF)

Might help someone someday!

  • Related