id | col1 | col2 | ones | twos |
---|---|---|---|---|
1 | 0 | 1 | col2 | |
2 | 1 | 2 | col1 | col2 |
3 | 0 | 1 | col2 | |
4 | 1 | 1 | col1, col2 | |
5 | 2 | 2 | col1,col2 | |
6 | 0 | 1 | col2 |
I need to collect the column names containing the 1s and 2s into one column each.
CodePudding user response:
concat_ws
with a smart when
condition may do it.
from pyspark.sql import functions as F
df = spark.createDataFrame(
[(1, 0, 1),
(2, 1, 2),
(3, 0, 1),
(4, 1, 1),
(5, 2, 2),
(6, 0, 1)],
['id', 'col1', 'col2']
)
df = df.select(
'*',
F.concat_ws(', ', *[F.when(F.col(c) == 1, c) for c in {'col1', 'col2'}]).alias('ones'),
F.concat_ws(', ', *[F.when(F.col(c) == 2, c) for c in {'col1', 'col2'}]).alias('twos'),
)
df.show()
# --- ---- ---- ---------- ----------
# | id|col1|col2| ones| twos|
# --- ---- ---- ---------- ----------
# | 1| 0| 1| col2| |
# | 2| 1| 2| col1| col2|
# | 3| 0| 1| col2| |
# | 4| 1| 1|col1, col2| |
# | 5| 2| 2| |col1, col2|
# | 6| 0| 1| col2| |
# --- ---- ---- ---------- ----------