Home > Mobile >  Find last occurrence of a space in a string & replace with ':' in Spark dataframe
Find last occurrence of a space in a string & replace with ':' in Spark dataframe

Time:12-21

I have dates & time separated by a space in a string format in spark dataframe column like this -

DTC
11 AUGUST 2012 10:12
12 AUGUST 2012 10:12
13 AUGUST 2012 10:12

I want to replace last space in each date with ':' in the same column. Output should look like this

DTC
11 AUGUST 2012:10:12
12 AUGUST 2012:10:12
13 AUGUST 2012:10:12

Any suggestions ?

CodePudding user response:

Match on

^(.*) (.*)$

and substitute with

\1:\2

Match:

  • ^ - start of line anchor
  • (.*) - match and capture zero or more characters followed by a space, greedy
  • (.*) - match and capture zero or more characters
  • $ - end of line anchor

Substitute:

  • \1 - the first capture
  • : - a literal :
  • \2 - the second capture

Demo

CodePudding user response:

Date and timestamp tranformations are well catered for in python. Acquint yourself with the date and timestamp functions and will save you heaps.

coerce the string to timestamp using pyspark's to to_timestamp and the format using date_format

Data

 df=spark.createDataFrame([ ( 1 , '11 AUGUST 2012 10:12' ),
     ( 1  ,'12 AUGUST 2012 10:12')],
    ('id' ,'DTC'  ))

Code

df.withColumn('DTC',date_format(to_timestamp('DTC',"dd MMMM yyyy HH:mm"),"dd MMMM yyyy:HH:mm")).show()

outcome

 --- -------------------- 
| id|                 DTC|
 --- -------------------- 
|  1|11 August 2012:10:12|
|  1|12 August 2012:10:12|
  • Related