Concatenate column names in a new column based on their values-CodePudding

id	col1	col2	ones	twos
1	0	1	col2
2	1	2	col1	col2
3	0	1	col2
4	1	1	col1, col2
5	2	2		col1,col2
6	0	1	col2

I need to collect the column names containing the 1s and 2s into one column each.

CodePudding user response：

concat_ws with a smart when condition may do it.

from pyspark.sql import functions as F
df = spark.createDataFrame(
    [(1, 0, 1),
     (2, 1, 2),
     (3, 0, 1),
     (4, 1, 1),
     (5, 2, 2),
     (6, 0, 1)],
    ['id', 'col1', 'col2']
)
df = df.select(
    '*',
    F.concat_ws(', ', *[F.when(F.col(c) == 1, c) for c in {'col1', 'col2'}]).alias('ones'),
    F.concat_ws(', ', *[F.when(F.col(c) == 2, c) for c in {'col1', 'col2'}]).alias('twos'),
)
df.show()
#  --- ---- ---- ---------- ---------- 
# | id|col1|col2|      ones|      twos|
#  --- ---- ---- ---------- ---------- 
# |  1|   0|   1|      col2|          |
# |  2|   1|   2|      col1|      col2|
# |  3|   0|   1|      col2|          |
# |  4|   1|   1|col1, col2|          |
# |  5|   2|   2|          |col1, col2|
# |  6|   0|   1|      col2|          |
#  --- ---- ---- ---------- ----------