Home > OS >  Create a Spark dataframe with thousands of columns and then add a column of ArrayType that hold them
Create a Spark dataframe with thousands of columns and then add a column of ArrayType that hold them

Time:12-01

I'd like to create a dataframe in Spark with Scala code like this:

col_1 col_2 col_3 .. col_2048
0.123 0.234 ... ... 0.323
0.345 0.456 ... ... 0.534

Then add an extra column of ArrayType to it, that holds all these 2048 columns data in one column:

col_1 col_2 col_3 .. col_2048 array_col
0.123 0.234 ... ... 0.323 [0,123, 0.234, ..., 0.323]
0.345 0.456 ... ... 0.534 [0.345, 0.456, ..., 0.534]

CodePudding user response:

try this

df.withColumn("array_col",array(df.columns.map(col): _*)).show

CodePudding user response:

PySpark:

Create column list and use python map.

cols = df.columns

df.withColumn('array_col', f.array(*map(lambda c: f.col(c), cols)))
  • Related