Split a python pandas column using positional string in another column-CodePudding

I have a dataframe with the structure:

ID	Split	Data
1	GT:RC:BC:CN	1:4:5:3
2	GT:RC:CN	1:7:0
3	GT:BC	4:2

I would like to create n new columns and populate with the data in the Data column, where n is the total number of unique fields split by a colon in the Split column (in this case, this would be 4 new columns: GT, RC, BC, CN). The new columns should be populated with the corresponding data in the Data column, so for ID 3, only column GT and BC should be populated. I have tried using string splitting, but that doesn't take into account the correct column to move the data to.

The output should look like this:

ID	Split	Data	GT	RC	BC	CN
1	GT:RC:BC:CN	1:4:5:3	1	4	5	3
2	GT:RC:CN	1:7:0	1	7		0
3	GT:BC	4:2	4		2

CodePudding user response：

You can use:

out = df.join(pd.concat([pd.Series(d.split(':'), index=s.split(':'))
                         for s,d in zip(df['Split'], df['Data'])], axis=1).T)

output:

   ID        Split     Data GT   RC   BC   CN
0   1  GT:RC:BC:CN  1:4:5:3  1    4    5    3
1   2     GT:RC:CN    1:7:0  1    7  NaN    0
2   3        GT:BC      4:2  4  NaN    2  NaN