I have dates & time separated by a space in a string format in spark dataframe column like this -
DTC
11 AUGUST 2012 10:12
12 AUGUST 2012 10:12
13 AUGUST 2012 10:12
I want to replace last space in each date with ':' in the same column. Output should look like this
DTC
11 AUGUST 2012:10:12
12 AUGUST 2012:10:12
13 AUGUST 2012:10:12
Any suggestions ?
CodePudding user response:
Match on
^(.*) (.*)$
and substitute with
\1:\2
Match:
^
- start of line anchor(.*)
- match and capture zero or more characters followed by a space, greedy(.*)
- match and capture zero or more characters$
- end of line anchor
Substitute:
\1
- the first capture:
- a literal:
\2
- the second capture
CodePudding user response:
Date and timestamp tranformations are well catered for in python. Acquint yourself with the date and timestamp functions and will save you heaps.
coerce the string to timestamp using pyspark's to to_timestamp
and the format using date_format
Data
df=spark.createDataFrame([ ( 1 , '11 AUGUST 2012 10:12' ),
( 1 ,'12 AUGUST 2012 10:12')],
('id' ,'DTC' ))
Code
df.withColumn('DTC',date_format(to_timestamp('DTC',"dd MMMM yyyy HH:mm"),"dd MMMM yyyy:HH:mm")).show()
outcome
--- --------------------
| id| DTC|
--- --------------------
| 1|11 August 2012:10:12|
| 1|12 August 2012:10:12|