Home > Software engineering >  Converting dict to DataFrame gives too many rows
Converting dict to DataFrame gives too many rows

Time:12-02

I am trying to convert a dict to Pandas DataFrame as the following:

dff = pd.DataFrame(
{
'CEO': 'ucMMe Mhll', 
'address': 'vs5dlt3 B Se1kC eve0nre', 
'address2': '-', 
'city': 'a CSatanral', 
'companyName': 'Agilent Technologies Inc.', 
'country': 'nUatei tdetSs', 
'description': "tns oo el' yty", 
'employees': 17124, 
'exc': 'gdgdgd', 
'industry': 'sgeiTeotiroaLbtans r', 
'issueType': 'abc', 
'phone': '14087832319', 
'primarySicCode': 4008, 
'sector': ',atnSii Scilcofe,nnse TecisaPliinafs cedorhv cre', 
'securityName': 'elooIne.nen htc iisTcgAgl', 
'state': 'ailairofnC', 
'symbol': 'A', 
'tags': ['nllh he', 'gth', 'acsl', 'isiad', 'nr aitT'], 
'website': 'win.gcm.', 
'zip': '0752501-19'} )

And when I print out the DataFrame, I see the following output:

print(dff)

enter image description here

I expect to see 1 row only in the DataFrame but it gives 5. And I cannot understand why. What am I doing wrong here?

CodePudding user response:

You're not doing anything wrong. Since tags is a list, Pandas broadcasts all other fields to same size as tags and make a dataframe. You can do:

pd.Series(your_dict).to_frame().T

Or wrap your dict around [] indicating it's a row (record orient):

pd.DataFrame([your_dict])

CodePudding user response:

This is because your tags row has 5, so it tries to 'fill in the blanks for the rest'. To fix this, put a second layer of brackets around it, so it treats it as one row, not 5.

dff = pd.DataFrame(
{
'CEO': 'ucMMe Mhll', 
'address': 'vs5dlt3 B Se1kC eve0nre', 
'address2': '-', 
'city': 'a CSatanral', 
'companyName': 'Agilent Technologies Inc.', 
'country': 'nUatei tdetSs', 
'description': "tns oo el' yty", 
'employees': 17124, 
'exc': 'gdgdgd', 
'industry': 'sgeiTeotiroaLbtans r', 
'issueType': 'abc', 
'phone': '14087832319', 
'primarySicCode': 4008, 
'sector': ',atnSii Scilcofe,nnse TecisaPliinafs cedorhv cre', 
'securityName': 'elooIne.nen htc iisTcgAgl', 
'state': 'ailairofnC', 
'symbol': 'A', 
'tags': [['nllh he', 'gth', 'acsl', 'isiad', 'nr aitT']], # Double brackets to indicate 1 cell 
'website': 'win.gcm.', 
'zip': '0752501-19'} )

CodePudding user response:

You could wrap each dictionary value in a list:

dff = pd.DataFrame({k: [v] for k,v in dct.items()})

>>> dff
          CEO                  address  ...   website         zip
0  ucMMe Mhll  vs5dlt3 B Se1kC eve0nre  ...  win.gcm.  0752501-19

[1 rows x 20 columns]
  • Related