Input:
ID | PERSONAL_AUX | OUTBOUND_AUX |
---|---|---|
1 | 0:09:00 | 0:00:08 |
1 | 0:19:00 | 0:00:40 |
1 | 0:13:00 | 0:00:33 |
1 | 0:08:00 | 0:00:22 |
Output:
ID | PERSONAL_AUX | OUTBOUND_AUX | PERSONAL_AUX_SEC | OUTBOUND_AUX_SEC |
---|---|---|---|---|
1 | 0:09:00 | 0:00:08 | 540 | 8 |
1 | 0:19:00 | 0:00:40 | 1140 | 40 |
Hello all, can somebody help me convert time into seconds. I am new to pyspark.TIA
CodePudding user response:
If your pattern remains consistent , you can parse the pattern into individual components = HH:MM:SS
And finally generate the required column by converting each parsed component into seconds
Data Preparation
d = {
'ID':[1] * 4,
'PERSONAL_AUX':['0:09:00','0:19:00','0:13:00','0:08:00'],
'OUTBOUND_AUX':['0:00:08','0:00:40','0:00:33','0:00:22']
}
sparkDF = sql.createDataFrame(pd.DataFrame(d))
--- ------------ ------------
| ID|PERSONAL_AUX|OUTBOUND_AUX|
--- ------------ ------------
| 1| 0:09:00| 0:00:08|
| 1| 0:19:00| 0:00:40|
| 1| 0:13:00| 0:00:33|
| 1| 0:08:00| 0:00:22|
--- ------------ ------------
Parse & Split Pattern
sparkDF = sparkDF.withColumn('PERSONAL_AUX_HOUR',F.split(F.col('PERSONAL_AUX'),':').getItem(0).cast(IntegerType()))\
.withColumn('PERSONAL_AUX_MIN',F.split(F.col('PERSONAL_AUX'),':').getItem(1).cast(IntegerType()))\
.withColumn('PERSONAL_AUX_SEC',F.split(F.col('PERSONAL_AUX'),':').getItem(2).cast(IntegerType()))\
.withColumn('OUTBOUND_AUX_HOUR',F.split(F.col('OUTBOUND_AUX'),':').getItem(0).cast(IntegerType()))\
.withColumn('OUTBOUND_AUX_MIN',F.split(F.col('OUTBOUND_AUX'),':').getItem(1).cast(IntegerType()))\
.withColumn('OUTBOUND_AUX_SEC',F.split(F.col('OUTBOUND_AUX'),':').getItem(2).cast(IntegerType()))
Standardisation to Seconds
sparkDF = sparkDF.withColumn('PERSONAL_AUX_SEC',F.col('PERSONAL_AUX_HOUR')*3600 F.col('PERSONAL_AUX_MIN')*60 F.col('PERSONAL_AUX_SEC'))\
.withColumn('OUTBOUND_AUX_SEC',F.col('OUTBOUND_AUX_HOUR')*3600 F.col('OUTBOUND_AUX_MIN')*60 F.col('OUTBOUND_AUX_SEC'))\
sparkDF = sparkDF.drop(*['PERSONAL_AUX_HOUR','PERSONAL_AUX_MIN','OUTBOUND_AUX_HOUR','OUTBOUND_AUX_MIN'])
sparkDF.show()
--- ------------ ------------ ---------------- ----------------
| ID|PERSONAL_AUX|OUTBOUND_AUX|PERSONAL_AUX_SEC|OUTBOUND_AUX_SEC|
--- ------------ ------------ ---------------- ----------------
| 1| 0:09:00| 0:00:08| 540| 8|
| 1| 0:19:00| 0:00:40| 1140| 40|
| 1| 0:13:00| 0:00:33| 780| 33|
| 1| 0:08:00| 0:00:22| 480| 22|
--- ------------ ------------ ---------------- ----------------
CodePudding user response:
You can use unix_timestamp
function to convert time to seconds
from pyspark.sql import functions as F
df = input \
.withColumn("Personal_aux_sec", F.unix_timestamp(F.col("PERSONAL_AUX"), 'H:mm:ss')) \
.withColumn("outbound_aux_sec", F.unix_timestamp(F.col("OUTBOUND_AUX"), 'H:mm:ss'))