I am using pyspark or pandas
I have this dataframe and each column contain a list of value:
COL1 | COL2 | COL3
["A","B"] | ["V1", "V2", "V3"] | ["V4","V5"]
I need to explode each column to present data differently and generate this dataframe, there is no data order between column :
COL1 | COL2 | COL3
A | V1 | V4
B | V2 | V5
null | V3 | null
thank you for your help
CodePudding user response:
Use pd.concat
:
out = pd.concat([sr.explode(ignore_index=True) for _, sr in df.iteritems()], axis=1)
print(out)
# Output:
COL1 COL2 COL3
0 A V1 V4
1 B V2 V5
2 NaN V3 NaN
CodePudding user response:
Try this:
df = df.apply(lambda row: row.apply(lambda cell: cell [np.nan] * (max(row.apply(len)) - len(cell))), axis=1).explode(df.columns.tolist())
Output:
>>> df
COL1 COL2 COL3
0 A V1 V4
0 B V2 V5
0 NaN V3 NaN
CodePudding user response:
Use Spark inbuilt functions arrays_zip and explode
to get the required output.
Example:
df.select(explode(arrays_zip(col("col1"),col("col2"),col("col3")))).\
select("col.*").\
show(10,False)
# ---- ---- ----
#|col1|col2|col3|
# ---- ---- ----
#|a |v1 |v4 |
#|b |v2 |v5 |
#|null|v3 |null|
# ---- ---- ----