I have a dataset
Name Type Cluster Value
ABC AA,BB AZ,YZ 15
LMN CC,DD,EE LM,LM,LM 20
with many other columns.
I want to convert it to a dataframe like:
Name Type Cluster Value TypeSubset ClusterSubset
ABC AA, BB AZ, YZ 15 AA AZ
ABC AA, BB AZ, YZ 15 BB YZ
LMN CC,DD,EE LM,LM,LM 20 CC LM
LMN CC,DD,EE LM,LM,LM 20 DD LM
LMN CC,DD,EE LM,LM,LM 20 EE LM
The dataframe can have many columns. But the Number of elements in Type
and Cluster
will be same. I just want them separated into different rows and duplicate all the other columns.
How can I do it in python.
I tried
df.set_index(['Type','Cluster'])
.apply(lambda x: x.astype(str).str.split(',').explode())
.reset_index()) ```
Not getting the desired result.
CodePudding user response:
assign
new columns and explode
in parallel.
(df.assign(TypeSubset=df['Type'].str.split(','),
ClusterSubset=df['Cluster'].str.split(',')
)
.explode(['TypeSubset', 'ClusterSubset'])
)