How to show data with different number of columns Pandas?-CodePudding

I load CSV document with different number of columns. Therefore I got this error:

Expected 12 fields in line 29, saw 13

To avoid this error I use the hack names=range(24)

df = pd.read_csv(filename, header=None, quoting=csv.QUOTE_NONE, dtype='object', sep=data_file_delimiter, engine='python', encoding = "utf-8", names=range(24))

Problem is I need to know the real number of columns to group this data further into dict data:

data = {}


    for row in df.rows:
        line = line.strip()
        row = line.split(' ')
        if len(row) not in data:
            data[ len(row) ] = []
        data[ len(row) ].append(row)

CodePudding user response：

You can have the number of columns using len(df.columns) but if you only want to convert a pandas df to a dictionary then there are already many built-in methods as given below,

   df = pd.DataFrame({'col1': [1, 2], 'col2': [0.5, 0.75]},index=['row1', 'row2'])

   df
         col1  col2
   row1     1  0.50
   row2     2  0.75

   df.to_dict()
   {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}

   # You can specify the return orientation.

   df.to_dict('series')
   {'col1': row1    1
            row2    2
   Name: col1, dtype: int64,
   'col2': row1    0.50
           row2    0.75
   Name: col2, dtype: float64}

   df.to_dict('split')
   {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
   'data': [[1, 0.5], [2, 0.75]]}

   df.to_dict('records')
   [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]

   df.to_dict('index')
   {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}

   df.to_dict('tight')
   {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
   'data': [[1, 0.5], [2, 0.75]], 'index_names': [None],  'column_names': [None]}

   # You can also specify the mapping type.

   from collections import OrderedDict, defaultdict

   df.to_dict(into=OrderedDict)
   OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])),
         ('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))])

Taken from here