I am trying to concatonate multiple columns to just one column, but only if the column name is in a list. so issue = {'a','b','c'} is my list and would need to concatonate it as issue column with ; seperator.
I have tried: 1.
df_issue = df.withColumn('issue', concat_ws(';',map_values(custom.({issue}))))
Which returns invalid syntax error
2.
df_issue = df.withColumn('issue', lit(issue))
this just returnd a b c and not their value
Thank you
I have tried: 1.
df_issue = df.withColumn('issue', concat_ws(';',map_values(custom.({issue}))))
Which returns invalid syntax error
2.
df_issue = df.withColumn('issue', lit(issue))
this just returnd a b c and not their value
Thank you
CodePudding user response:
To concatenate multiple columns to one column, you can use the concat_ws() function in combination with the collect_list() function. The concat_ws() function concatenates the values of multiple columns into one column, using a specified separator. The collect_list() function collects the values of a specified column into a list.
Here is an example of how to use these functions to concatenate the values of the columns in the list issue into one column:
from pyspark.sql.functions import concat_ws, collect_list
df_issue = df.withColumn('issue', concat_ws(';', collect_list(issue)))
This code uses the concat_ws() and collect_list() functions to concatenate the values of the columns in the issue list into one column, using a semicolon as the separator. The resulting column will contain the concatenated values of the specified columns.
CodePudding user response:
You can simply use concat_ws
:
from pyspark.sql import functions as F
columns_to_concat = ['a', 'b', 'c']
df.withColumn('issue', F.concat_ws(';', *columns_to_concat))
So, if your input DataFrame
is:
--- --- --- ---------- ---------- -----
| a| b| c| date1| date2|value|
--- --- --- ---------- ---------- -----
| k1| k2| k3|2022-11-11|2022-11-14| 5|
| k4| k5| k6|2022-11-15|2022-11-19| 5|
| k7| k8| k9|2022-11-15|2022-11-19| 5|
--- --- --- ---------- ---------- -----
The previous code will produce:
--- --- --- ---------- ---------- ----- --------
| a| b| c| date1| date2|value| issue|
--- --- --- ---------- ---------- ----- --------
| k1| k2| k3|2022-11-11|2022-11-14| 5|k1;k2;k3|
| k4| k5| k6|2022-11-15|2022-11-19| 5|k4;k5;k6|
| k7| k8| k9|2022-11-15|2022-11-19| 5|k7;k8;k9|
--- --- --- ---------- ---------- ----- --------