I have a very large pyspark dataframe in which I need to select a lot of columns (which is why I want to use a for instead of writing each column name). The majority of those columns I need to cast them to DoubleType(), except for one column that I need to keep as a StringType() (column "ID").
When I'm selecting all the columns that I need to cast to DoubleType() I use this code (it works) :
df_num2 = df_num1.select([col(c).cast(DoubleType()) for c in num_columns])
How can I also select my column "ID" which is a StringType() ?
CodePudding user response:
List concatenation in python :
df_num2 = df_num1.select(["id"] [col(c).cast(DoubleType()) for c in num_columns])
# OR
df_num2 = df_num1.select(["id", *(col(c).cast(DoubleType()) for c in num_columns)])