Strange behavior in Python Pandas read_csv .to

I have a dataset with data formatted like this:

Name,Code
Mozambique,MZ
Myanmar,MM
Namibia,NA
Nauru,NR
Nepal,NP
Netherlands,NL

I'm loading this data into a database by first converting the CSV file to a dictionary.

I use the following command to perform the conversion:

dict_from_csv = pd.read_csv('test.csv', header=0, index_col=0, squeeze=True).to_dict()

When I do this the value for item with key Namibia is evaluated as nan

I created a small test harness to validate this

import pandas as pd

dict_from_csv = pd.read_csv('test.csv', header=0, index_col=0, squeeze=True).to_dict()

print(dict_from_csv)

The results of running this is:

{'Mozambique': 'MZ', 'Myanmar': 'MM', 'Namibia': nan, 'Nauru': 'NR', 'Nepal': 'NP', 'Netherlands': 'NL'}

Since I'm inserting this information into a database table with a NOT NULL constraint this obviously doesn't work.

I've tried wrapping the the NA in the data file in double quotes and end up with the same results.

If I wrap the NA in the data file in single quotes, it does convert to a string correctly but is stored in the dictionary with the enclosing single quotes.

CodePudding user response：

pandas.read_csv has a parameter keep_default_na. Setting this to False solves the problem.

The corrected line is:

dict_from_csv = pd.read_csv(
    'test.csv', header=0, index_col=0, squeeze=True, keep_default_na=False
).to_dict()

CodePudding user response：

please upload your dataset(test.csv)