Home > Mobile >  Converting String to Integer Returns null in PySpark
Converting String to Integer Returns null in PySpark

Time:12-14

I am trying to convert a string to integer in my PySpark code.

input = 1670900472389, where 1670900472389 is a string

I am doing this but it's returning null.

df = df.withColumn("lastupdatedtime_new",col("lastupdatedtime").cast(IntegerType()))

I have read the posts on Stack Overflow. They have quotes or commas in their input string causing this. However that's not the case with my input string. Any ideas what's happening?

CodePudding user response:

The max value that a Java integer can hold is 2147483647 i.e. 32-bits or 231-1.

Use LongType instead:

import pyspark.sql.functions as F
from pyspark.sql.types import LongType

df = spark.createDataFrame(data=[["1670900472389"]], schema=["lastupdatedtime"])

df = df.withColumn("lastupdatedtime_new", F.col("lastupdatedtime").cast(LongType()))

Output:

 --------------- ------------------- 
|lastupdatedtime|lastupdatedtime_new|
 --------------- ------------------- 
|1670900472389  |1670900472389      |
 --------------- ------------------- 

Schema:

root
 |-- lastupdatedtime: string (nullable = true)
 |-- lastupdatedtime_new: long (nullable = true)
  • Related