Home > Blockchain >  Convert time to seconds in pyspark
Convert time to seconds in pyspark

Time:04-07

Input:

ID PERSONAL_AUX OUTBOUND_AUX
1 0:09:00 0:00:08
1 0:19:00 0:00:40
1 0:13:00 0:00:33
1 0:08:00 0:00:22

Output:

ID PERSONAL_AUX OUTBOUND_AUX PERSONAL_AUX_SEC OUTBOUND_AUX_SEC
1 0:09:00 0:00:08 540 8
1 0:19:00 0:00:40 1140 40

Hello all, can somebody help me convert time into seconds. I am new to pyspark.TIA

CodePudding user response:

If your pattern remains consistent , you can parse the pattern into individual components = HH:MM:SS

And finally generate the required column by converting each parsed component into seconds

Data Preparation

d = {
    'ID':[1] * 4,
    'PERSONAL_AUX':['0:09:00','0:19:00','0:13:00','0:08:00'],
    'OUTBOUND_AUX':['0:00:08','0:00:40','0:00:33','0:00:22']
}

sparkDF = sql.createDataFrame(pd.DataFrame(d))

 --- ------------ ------------ 
| ID|PERSONAL_AUX|OUTBOUND_AUX|
 --- ------------ ------------ 
|  1|     0:09:00|     0:00:08|
|  1|     0:19:00|     0:00:40|
|  1|     0:13:00|     0:00:33|
|  1|     0:08:00|     0:00:22|
 --- ------------ ------------ 

Parse & Split Pattern

sparkDF = sparkDF.withColumn('PERSONAL_AUX_HOUR',F.split(F.col('PERSONAL_AUX'),':').getItem(0).cast(IntegerType()))\
                 .withColumn('PERSONAL_AUX_MIN',F.split(F.col('PERSONAL_AUX'),':').getItem(1).cast(IntegerType()))\
                 .withColumn('PERSONAL_AUX_SEC',F.split(F.col('PERSONAL_AUX'),':').getItem(2).cast(IntegerType()))\
                 .withColumn('OUTBOUND_AUX_HOUR',F.split(F.col('OUTBOUND_AUX'),':').getItem(0).cast(IntegerType()))\
                 .withColumn('OUTBOUND_AUX_MIN',F.split(F.col('OUTBOUND_AUX'),':').getItem(1).cast(IntegerType()))\
                 .withColumn('OUTBOUND_AUX_SEC',F.split(F.col('OUTBOUND_AUX'),':').getItem(2).cast(IntegerType()))

Standardisation to Seconds

sparkDF = sparkDF.withColumn('PERSONAL_AUX_SEC',F.col('PERSONAL_AUX_HOUR')*3600   F.col('PERSONAL_AUX_MIN')*60   F.col('PERSONAL_AUX_SEC'))\
                 .withColumn('OUTBOUND_AUX_SEC',F.col('OUTBOUND_AUX_HOUR')*3600   F.col('OUTBOUND_AUX_MIN')*60   F.col('OUTBOUND_AUX_SEC'))\

sparkDF = sparkDF.drop(*['PERSONAL_AUX_HOUR','PERSONAL_AUX_MIN','OUTBOUND_AUX_HOUR','OUTBOUND_AUX_MIN'])
                             
sparkDF.show()

 --- ------------ ------------ ---------------- ---------------- 
| ID|PERSONAL_AUX|OUTBOUND_AUX|PERSONAL_AUX_SEC|OUTBOUND_AUX_SEC|
 --- ------------ ------------ ---------------- ---------------- 
|  1|     0:09:00|     0:00:08|             540|               8|
|  1|     0:19:00|     0:00:40|            1140|              40|
|  1|     0:13:00|     0:00:33|             780|              33|
|  1|     0:08:00|     0:00:22|             480|              22|
 --- ------------ ------------ ---------------- ---------------- 

CodePudding user response:

You can use unix_timestamp function to convert time to seconds

from pyspark.sql import functions as F

df = input \
  .withColumn("Personal_aux_sec", F.unix_timestamp(F.col("PERSONAL_AUX"), 'H:mm:ss')) \
  .withColumn("outbound_aux_sec", F.unix_timestamp(F.col("OUTBOUND_AUX"), 'H:mm:ss'))
  • Related