df = spark.read.csv("Sales_December.csv", header=True)
df.printSchema()
returns string for Order Id
I want to change the schema so it returns int for order id
CodePudding user response:
from pyspark.sql.types import IntegerType
df = df.withColumn('order id', func.col('order id').cast(IntegerType()))
CodePudding user response:
you can use withColumn.
WithColumn syntax -->withColumn(colName : String, col : Column) : DataFrame
For example:
df2 = df.withColumn("Order Id",col("Order Id").cast(IntegerType()))
df2.printSchema()