I have a dataframe with the structure:
ID | Split | Data |
---|---|---|
1 | GT:RC:BC:CN | 1:4:5:3 |
2 | GT:RC:CN | 1:7:0 |
3 | GT:BC | 4:2 |
I would like to create n new columns and populate with the data in the Data column, where n is the total number of unique fields split by a colon in the Split column (in this case, this would be 4 new columns: GT, RC, BC, CN). The new columns should be populated with the corresponding data in the Data column, so for ID 3, only column GT and BC should be populated. I have tried using string splitting, but that doesn't take into account the correct column to move the data to.
The output should look like this:
ID | Split | Data | GT | RC | BC | CN |
---|---|---|---|---|---|---|
1 | GT:RC:BC:CN | 1:4:5:3 | 1 | 4 | 5 | 3 |
2 | GT:RC:CN | 1:7:0 | 1 | 7 | 0 | |
3 | GT:BC | 4:2 | 4 | 2 |
CodePudding user response:
You can use:
out = df.join(pd.concat([pd.Series(d.split(':'), index=s.split(':'))
for s,d in zip(df['Split'], df['Data'])], axis=1).T)
output:
ID Split Data GT RC BC CN
0 1 GT:RC:BC:CN 1:4:5:3 1 4 5 3
1 2 GT:RC:CN 1:7:0 1 7 NaN 0
2 3 GT:BC 4:2 4 NaN 2 NaN