Question seems simple but can't find easy way to solve it.
I'm trying to dynamically create new columns in selectExpr
, but it won't accept a list as an argument. What would be the best way to achieve it? (Multiple withColumn
is not an option because of the stackoverflowexception
input:
a | b
-------
1 | zzz
2 | xxx
tried something like this
sample_new_cols = {"s":"ran-s",
"ts": "current_timestamp()",
}
df = df.selectExpr('*',
[
f"{definition} as {name}"
for name, definition in sample_new_cols.items()
]
)
and the output of it would be
a | b | s | ts
------------|-----------
1 |zzz|ran-s|2021-12-01 08:10:21
2 |xxx|ran-s|2021-12-01 08:10:21
CodePudding user response:
You almost got it :
- for string static column definitions, you need to quote the values (eg.
'ran-s'
) - and in
selectExpr
, you need to use asterisk*
before the array of columns
sample_new_cols = {
"s": "'ran-s'",
"ts": "current_timestamp()",
}
df1 = df.selectExpr('*', *[
f"{definition} as {name}"
for name, definition in sample_new_cols.items()
])
df1.show()
# --- --- ----- -----------------------
#|a |b |s |ts |
# --- --- ----- -----------------------
#|1 |zzz|ran-s|2021-12-01 14:23:14.779|
#|2 |xxx|ran-s|2021-12-01 14:23:14.779|
# --- --- ----- -----------------------