Create a Spark dataframe with thousands of columns and then add a column of ArrayType that hold them-CodePudding

I'd like to create a dataframe in Spark with Scala code like this:

col_1	col_2	col_3	..	col_2048
0.123	0.234	...	...	0.323
0.345	0.456	...	...	0.534

Then add an extra column of ArrayType to it, that holds all these 2048 columns data in one column:

col_1	col_2	col_3	..	col_2048	array_col
0.123	0.234	...	...	0.323	[0,123, 0.234, ..., 0.323]
0.345	0.456	...	...	0.534	[0.345, 0.456, ..., 0.534]

CodePudding user response：

try this

df.withColumn("array_col",array(df.columns.map(col): _*)).show

CodePudding user response：

PySpark:

Create column list and use python map.

cols = df.columns

df.withColumn('array_col', f.array(*map(lambda c: f.col(c), cols)))