I am converting string dataframe to datetime using pyspark, here is my input,
--------------
| col1 |
--------------
|18300031121994|
|18300031122018|
|12324031012020|
|19590031052020|
|19590030062020|
--------------
Expected output,
col1
1994-12-31 18:30:00
2018-12-31 18:30:00
2020-01-31 12:32:40
2020-05-31 19:59:00
2020-06-30 19:59:00
here is my snippet,
df.select(col("col1"),to_date(col("col1"),"hhmmssMMddyyyy").alias("datetime")).show()
when I execute above code it gives the same output as input, Please help where I am going wrong
CodePudding user response:
You need to use the correct format. The correct format for the data you have provided is "hmmssddMMyyyy". Try this:
df.select(col("col1"),to_date(col("col1"),"hmmssddMMyyyy").alias("datetime")).show()
CodePudding user response:
Here try this :
from pyspark.sql.functions import to_timestamp
df = df.withColumn("timestamp", to_timestamp(df.timestamp_string, "yyyy-MM-dd HH:mm:ss"))