Home > Enterprise >  How to pass list to the selectExpr method in pyspark?
How to pass list to the selectExpr method in pyspark?

Time:12-03

Question seems simple but can't find easy way to solve it.

I'm trying to dynamically create new columns in selectExpr, but it won't accept a list as an argument. What would be the best way to achieve it? (Multiple withColumn is not an option because of the stackoverflowexception input:

a | b
-------
1 | zzz
2 | xxx

tried something like this

sample_new_cols = {"s":"ran-s", 
                  "ts": "current_timestamp()",
                  }

 df = df.selectExpr('*',
            [
                f"{definition} as {name}"
                for name, definition in sample_new_cols.items()
            ]
        )

and the output of it would be

a | b | s   | ts 
------------|-----------
1 |zzz|ran-s|2021-12-01 08:10:21
2 |xxx|ran-s|2021-12-01 08:10:21

CodePudding user response:

You almost got it :

  • for string static column definitions, you need to quote the values (eg. 'ran-s')
  • and in selectExpr, you need to use asterisk * before the array of columns
sample_new_cols = {
    "s": "'ran-s'",
    "ts": "current_timestamp()",
}

df1 = df.selectExpr('*', *[
    f"{definition} as {name}"
    for name, definition in sample_new_cols.items()
])

df1.show()

# --- --- ----- ----------------------- 
#|a  |b  |s    |ts                     |
# --- --- ----- ----------------------- 
#|1  |zzz|ran-s|2021-12-01 14:23:14.779|
#|2  |xxx|ran-s|2021-12-01 14:23:14.779|
# --- --- ----- ----------------------- 
  • Related