Home > Net >  PySpark string column to timestamp conversion
PySpark string column to timestamp conversion

Time:09-25

I am currently learning pyspark and I need to convert a COLUMN of strings in format 13/09/2021 20:45 into a timestamp of just the hour 20:45.

Now I figured that I can do this with q1.withColumn("timestamp",to_timestamp("ts")) \ .show() (where q1 is my dataframe, and ts is a column we are speaking about) to convert my input into a DD/MM/YYYY HH:MM format, however values returned are only null. I therefore realised that I need an input in PySpark timestamp format (MM-dd-yyyy HH:mm:ss.SSSS) to convert it to a proper timestamp. Hence now my question:

How can I convert the column of strings dd/mm/yyyy hh:mm into a format understandable for pyspark so that I can convert it to timestamp format?

CodePudding user response:

unix_timestamp may be a help for your problem.

Just try this: Convert pyspark string to date format

CodePudding user response:

There are different ways you can do that

from pyspark.sql import functions as F

# use substring
df.withColumn('hour', F.substring('A', 12, 15)).show()

# use regex
df.withColumn('hour', F.regexp_extract('A', '\d{2}:\d{2}', 0)).show()

# use datetime
df.withColumn('hour', F.from_unixtime(F.unix_timestamp('A', 'dd/MM/yyyy HH:mm'), 'HH:mm')).show()

# Output
#  ---------------- ----- 
# |               A| hour|
#  ---------------- ----- 
# |13/09/2021 20:45|20:45|
#  ---------------- ----- 
  • Related