Please consider the following simplified Pandas dataframe:
Name | Component |
---|---|
D800465 | [{'component': 'comp1', 'version': '1.0.0'}, {'component': 'comp2', 'version': '15.2.5'}] |
L932227 | [{'component': 'comp1', 'version': '1.0.0'}, {'component': 'comp2', 'version': '15.2.5'}, {'component': 'comp3', 'version': '2.5'}] |
L908041 | [{'component': 'comp1', 'version': '1.0.0'}] |
D797502 | [{'component': 'comp1', 'version': '1.0.0'}] |
As you understand, the column 'Component' contains lists of dictionnaries, which size may vary. I want to perform 2 actions on this dataframe: create new columns, 1 for 'ComponentName' and one for 'ComponentVersion'. Beside of this, I want to create any number of rows necessary depnding on the size of my list.
The expected output (with the same exemple as above) should be like this:
Name | ComponentName | ComponentVersion |
---|---|---|
D800465 | comp1 | 1.0.0 |
D800465 | comp2 | 15.2.5 |
L932227 | comp1 | 1.0.0 |
L932227 | comp2 | 15.2.5 |
L932227 | comp3 | 2.5 |
L908041 | comp1 | 1.0.0 |
D797502 | comp1 | 1.0.0 |
How can I achieve this ? Thank's a lot
CodePudding user response:
You can explode
and convert the dictionaries to columns with pandas.json_normalize
:
df2 = df.explode('Component')
df2 = (df2[['Name']].reset_index(drop=True)
.join(pd.json_normalize(df2['Component']))
)
output:
Name component version
0 D800465 comp1 1.0.0
1 D800465 comp2 15.2.5
2 L932227 comp1 1.0.0
3 L932227 comp2 15.2.5
4 L932227 comp3 2.5
5 L908041 comp1 1.0.0
6 D797502 comp1 1.0.0