Converting the comma separated values in a cell to different rows by duplicating the other column co-CodePudding

I have a dataset

Name    Type        Cluster      Value
ABC     AA,BB       AZ,YZ        15
LMN     CC,DD,EE    LM,LM,LM     20

with many other columns.

I want to convert it to a dataframe like:

Name    Type        Cluster     Value    TypeSubset    ClusterSubset
ABC     AA, BB      AZ, YZ       15       AA            AZ 
ABC     AA, BB      AZ, YZ       15       BB            YZ
LMN     CC,DD,EE    LM,LM,LM     20       CC            LM
LMN     CC,DD,EE    LM,LM,LM     20       DD            LM 
LMN     CC,DD,EE    LM,LM,LM     20       EE            LM

The dataframe can have many columns. But the Number of elements in Type and Cluster will be same. I just want them separated into different rows and duplicate all the other columns.

How can I do it in python.

I tried

df.set_index(['Type','Cluster'])
   .apply(lambda x: x.astype(str).str.split(',').explode())
   .reset_index())  ```

Not getting the desired result.

CodePudding user response：

assign new columns and explode in parallel.

(df.assign(TypeSubset=df['Type'].str.split(','),
           ClusterSubset=df['Cluster'].str.split(',')
           )
   .explode(['TypeSubset', 'ClusterSubset'])
)