I will convert col in minutes to hours : minutes
col(min) |
---|
685 |
I will obtain
col(min) | col1(h:min) |
---|---|
685 | 11:25 |
CodePudding user response:
Use the sql functions div
and mod
to get the quotient and remainder respectively, and then concatenate them.
df = df.withColumn('col1', F.expr('concat(div(col, 60), ":", mod(col, 60))'))
CodePudding user response:
You can use .map to transform data from an RDD into one or more columns.
Python builtin function divmod
returns the quotient and remainder of an integer division. divmod(a, b)
is equivalent to (a // b, a % b)
.
rdd = sc.parallelize([
685, 180, 80
])
results = rdd.map(lambda x: divmod(x, 60))
print( results.collect() )
# [(11, 25), (3, 0), (1, 20)]
Or if you want the result as strings in format hh:mm
, use str.format
to format the values to your liking:
results = rdd.map(lambda x: '{:02d}:{:02d}'.format(*divmod(x, 60)))
print( results.collect() )
# ['11:25', '03:00', '01:20']
If you want to keep both the number of minutes and the resulting hh:mm
string:
results = rdd.map(lambda x: (x, '{:02d}:{:02d}'.format(*divmod(x, 60))))
print( results.collect() )
# [(685, '11:25'), (180, '03:00'), (80, '01:20')]