Home > Enterprise >  How to select columns and cast column types in a pyspark dataframe?
How to select columns and cast column types in a pyspark dataframe?

Time:11-18

I have a very large pyspark dataframe in which I need to select a lot of columns (which is why I want to use a for instead of writing each column name). The majority of those columns I need to cast them to DoubleType(), except for one column that I need to keep as a StringType() (column "ID").

When I'm selecting all the columns that I need to cast to DoubleType() I use this code (it works) :

df_num2 = df_num1.select([col(c).cast(DoubleType()) for c in num_columns])

How can I also select my column "ID" which is a StringType() ?

CodePudding user response:

List concatenation in python :

df_num2 = df_num1.select(["id"]   [col(c).cast(DoubleType()) for c in num_columns])

# OR

df_num2 = df_num1.select(["id", *(col(c).cast(DoubleType()) for c in num_columns)])
  • Related