Home > Enterprise >  round to precision value based on another column pyspark
round to precision value based on another column pyspark

Time:11-03

I have a data frame like below in pyspark

import pyspark.sql.functions as f

df = spark.createDataFrame(
[(123, 2897402, 43.25, 2),
(124, 2897402, 49.11, 0),
(125, 2897402, 43.25, 2), 
(126, 2897402, 48.75, 0)]
, ['model_id','lab_test_id','summary_measure_value','reading_precision'])

expected_output

 -------- ----------- --------------------- ----------------- ------------- 
|model_id|lab_test_id|summary_measure_value|reading_precision|reading_value|
 -------- ----------- --------------------- ----------------- ------------- 
|     123|    2897402|                43.25|                2|        43.25|
|     124|    2897402|                49.11|                1|         49.1|
|     125|    2897402|                43.25|                2|        43.25|
|     126|    2897402|                48.75|                0|         49.0|
 -------- ----------- --------------------- ----------------- ------------- 

I have tried like below

df1 = df.withColumn("reading_value", f.round(f.col("summary_measure_value"), f.col("reading_precision")))

I am getting Column is not iterable error.

How can I achieve what I want

CodePudding user response:

You may try using a udf that uses python's built in round function to achieve this eg:

@f.udf
def udf_round(value,precision):
    try:
        precision = int(precision)
        value = float(value)
        # use python built-in round function to round values
        return round(value,precision)
    except:
        # decide what to return when you encounter bad data
        # in this example I've returned the original value
        return value

df=df.withColumn("reading_value",udf_round( f.col("summary_measure_value"),f.col("reading_precision") ))
df.show(truncate=False)

Outputs:

 -------- ----------- --------------------- ----------------- ------------- 
|model_id|lab_test_id|summary_measure_value|reading_precision|reading_value|
 -------- ----------- --------------------- ----------------- ------------- 
|123     |2897402    |43.25                |2                |43.25        |
|124     |2897402    |49.25                |0                |49.0         |
|125     |2897402    |43.25                |2                |43.25        |
|126     |2897402    |48.75                |0                |49.0         |
 -------- ----------- --------------------- ----------------- ------------- 
  • Related