Home > Mobile >  Create column of decimal type
Create column of decimal type

Time:03-25

I would like to provide numbers when creating a Spark dataframe. I have issues providing decimal type numbers.

This way the number gets truncated:

df = spark.createDataFrame([(10234567891023456789.5, )], ["numb"])
df = df.withColumn("numb_dec", F.col("numb").cast("decimal(30,1)"))
df.show(truncate=False)
# --------------------- ---------------------- 
#|numb                 |numb_dec              |
# --------------------- ---------------------- 
#|1.0234567891023456E19|10234567891023456000.0|
# --------------------- ---------------------- 

This fails:

df = spark.createDataFrame([(10234567891023456789.5, )], "numb decimal(30,1)")
df.show(truncate=False)

TypeError: field numb: DecimalType(30,1) can not accept object 1.0234567891023456e 19 in type <class 'float'>

How to correctly provide big decimal numbers so that they wouldn't get truncated?

CodePudding user response:

Maybe this is related to some differences in floating points representation between Python and Spark. You can try passing string values when creating dataframe instead:

df = spark.createDataFrame([("10234567891023456789.5", )], ["numb"])

df = df.withColumn("numb_dec", F.col("numb").cast("decimal(30,1)"))
df.show(truncate=False)
# ---------------------- ---------------------- 
#|numb                  |numb_dec              |
# ---------------------- ---------------------- 
#|10234567891023456789.5|10234567891023456789.5|
# ---------------------- ---------------------- 

CodePudding user response:

Try something as below -

from pyspark.sql.types import *
from decimal import *

schema = StructType([StructField('numb', DecimalType(30,1))])

data = [( Context(prec=30, Emax=999, clamp=1).create_decimal('10234567891023456789.5'), )]

df = spark.createDataFrame(data=data, schema=schema)

df.show(truncate=False)

 ---------------------- 
|numb                  |
 ---------------------- 
|10234567891023456789.5|
 ---------------------- 
  • Related