Home > Back-end >  Concatenate column names in a new column based on their values
Concatenate column names in a new column based on their values

Time:06-08

id col1 col2 ones twos
1 0 1 col2
2 1 2 col1 col2
3 0 1 col2
4 1 1 col1, col2
5 2 2 col1,col2
6 0 1 col2

I need to collect the column names containing the 1s and 2s into one column each.

CodePudding user response:

concat_ws with a smart when condition may do it.

from pyspark.sql import functions as F
df = spark.createDataFrame(
    [(1, 0, 1),
     (2, 1, 2),
     (3, 0, 1),
     (4, 1, 1),
     (5, 2, 2),
     (6, 0, 1)],
    ['id', 'col1', 'col2']
)
df = df.select(
    '*',
    F.concat_ws(', ', *[F.when(F.col(c) == 1, c) for c in {'col1', 'col2'}]).alias('ones'),
    F.concat_ws(', ', *[F.when(F.col(c) == 2, c) for c in {'col1', 'col2'}]).alias('twos'),
)
df.show()
#  --- ---- ---- ---------- ---------- 
# | id|col1|col2|      ones|      twos|
#  --- ---- ---- ---------- ---------- 
# |  1|   0|   1|      col2|          |
# |  2|   1|   2|      col1|      col2|
# |  3|   0|   1|      col2|          |
# |  4|   1|   1|col1, col2|          |
# |  5|   2|   2|          |col1, col2|
# |  6|   0|   1|      col2|          |
#  --- ---- ---- ---------- ---------- 
  • Related