i'am trying to change the datatype of multiple columns (100 column) with pyspark,i'am trying to make a loop or something else that can helps to change th 100 column Any help will be appreciated. this was the syntax that helped me change 3 columns:
from pyspark.sql.types import (
IntegerType
)
dfcontract2 = dfcontract \
.withColumn("Offre durable" ,
dfcontract["Offre durable"]
.cast(IntegerType())) \
.withColumn("Offre non durable",
dfcontract["Offre non durable"]
.cast(IntegerType())) \
.withColumn("Total" ,
dfcontract["Total"]
.cast(IntegerType())) \
dfcontract2.printSchema()
CodePudding user response:
you can use list comprehension.
list_of_cols_to_update = ['col2', 'col3', 'col4'] # specify the columns that need casting
data_sdf. \
select(*[func.col(k).cast('int').alias(k) if k in list_of_cols_to_update else k for k in data_sdf.columns])
lets print the list from the comprehension to see how it looks
print([func.col(k).cast('int').alias(k) if k in list_of_cols_to_update else k for k in data_sdf.columns])
# ['col1', Column<'CAST(col2 AS INT) AS `col2`'>, Column<'CAST(col3 AS INT) AS `col3`'>, Column<'CAST(col4 AS INT) AS `col4`'>]
The list_of_cols_to_update
list can be generated using a list comprehension as well
list_of_cols_to_update = ['col' str(i) for i in range(2, 5)]
print(list_of_cols_to_update)
# ['col2', 'col3', 'col4']
CodePudding user response:
You can employ reduce in conjuction to withColumn
to do this -
DataFrame - Reduce
from functools import reduce
schema = {col: col_type for col, col_type in sparkDF.dtypes}
cast_cols = [ col for col, col_type in schema.items() if col_type in ["bigint"] ]
sparkDF = reduce(
lambda df, x: df.withColumn(x, F.col(x).cast(IntegerType())),
cast_cols,
sparkDF,
)