Home > database >  Pandas - Create new rows and cols based on dict of a column
Pandas - Create new rows and cols based on dict of a column

Time:06-28

Please consider the following simplified Pandas dataframe:

Name Component
D800465 [{'component': 'comp1', 'version': '1.0.0'}, {'component': 'comp2', 'version': '15.2.5'}]
L932227 [{'component': 'comp1', 'version': '1.0.0'}, {'component': 'comp2', 'version': '15.2.5'}, {'component': 'comp3', 'version': '2.5'}]
L908041 [{'component': 'comp1', 'version': '1.0.0'}]
D797502 [{'component': 'comp1', 'version': '1.0.0'}]

As you understand, the column 'Component' contains lists of dictionnaries, which size may vary. I want to perform 2 actions on this dataframe: create new columns, 1 for 'ComponentName' and one for 'ComponentVersion'. Beside of this, I want to create any number of rows necessary depnding on the size of my list.

The expected output (with the same exemple as above) should be like this:

Name ComponentName ComponentVersion
D800465 comp1 1.0.0
D800465 comp2 15.2.5
L932227 comp1 1.0.0
L932227 comp2 15.2.5
L932227 comp3 2.5
L908041 comp1 1.0.0
D797502 comp1 1.0.0

How can I achieve this ? Thank's a lot

CodePudding user response:

You can explode and convert the dictionaries to columns with pandas.json_normalize:

df2 = df.explode('Component')
df2 = (df2[['Name']].reset_index(drop=True)
       .join(pd.json_normalize(df2['Component']))
      )

output:

      Name component version
0  D800465     comp1   1.0.0
1  D800465     comp2  15.2.5
2  L932227     comp1   1.0.0
3  L932227     comp2  15.2.5
4  L932227     comp3     2.5
5  L908041     comp1   1.0.0
6  D797502     comp1   1.0.0
  • Related