I'd like to create a dataframe in Spark with Scala code like this:
col_1 | col_2 | col_3 | .. | col_2048 |
---|---|---|---|---|
0.123 | 0.234 | ... | ... | 0.323 |
0.345 | 0.456 | ... | ... | 0.534 |
Then add an extra column of ArrayType to it, that holds all these 2048 columns data in one column:
col_1 | col_2 | col_3 | .. | col_2048 | array_col |
---|---|---|---|---|---|
0.123 | 0.234 | ... | ... | 0.323 | [0,123, 0.234, ..., 0.323] |
0.345 | 0.456 | ... | ... | 0.534 | [0.345, 0.456, ..., 0.534] |
CodePudding user response:
try this
df.withColumn("array_col",array(df.columns.map(col): _*)).show
CodePudding user response:
PySpark:
Create column list and use python map.
cols = df.columns
df.withColumn('array_col', f.array(*map(lambda c: f.col(c), cols)))