I load CSV document with different number of columns. Therefore I got this error:
Expected 12 fields in line 29, saw 13
To avoid this error I use the hack names=range(24)
df = pd.read_csv(filename, header=None, quoting=csv.QUOTE_NONE, dtype='object', sep=data_file_delimiter, engine='python', encoding = "utf-8", names=range(24))
Problem is I need to know the real number of columns to group this data further into dict data
:
data = {}
for row in df.rows:
line = line.strip()
row = line.split(' ')
if len(row) not in data:
data[ len(row) ] = []
data[ len(row) ].append(row)
CodePudding user response:
You can have the number of columns using len(df.columns)
but if you only want to convert a pandas df to a dictionary then there are already many built-in methods as given below,
df = pd.DataFrame({'col1': [1, 2], 'col2': [0.5, 0.75]},index=['row1', 'row2'])
df
col1 col2
row1 1 0.50
row2 2 0.75
df.to_dict()
{'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}
# You can specify the return orientation.
df.to_dict('series')
{'col1': row1 1
row2 2
Name: col1, dtype: int64,
'col2': row1 0.50
row2 0.75
Name: col2, dtype: float64}
df.to_dict('split')
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
'data': [[1, 0.5], [2, 0.75]]}
df.to_dict('records')
[{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]
df.to_dict('index')
{'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}
df.to_dict('tight')
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}
# You can also specify the mapping type.
from collections import OrderedDict, defaultdict
df.to_dict(into=OrderedDict)
OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])),
('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))])
Taken from here