Home > Software engineering >  How to make sure all columns are read as string in pandas
How to make sure all columns are read as string in pandas

Time:10-03

I had thought this can make all values are read as string, but it doesn't:

df = pd.read_csv(file, sep='\t', dtype=str, low_memory=False)

Because when I do this;

for index, row in df.iterrows():
     id_value = row['id']
     ...

My error message says that 'id_value' is a float, which can't do str concatenation.

Why can't dtype=str achieve that in dataframe?

CodePudding user response:

According to the read_csv documentation, you have to set both dtype=str and na_values=""

Use str or object together with suitable na_values settings to preserve and not interpret dtype.

NaN is a float type (unless covering to the new pandas.NA), so if you have missing values, this is likely the origin of your error.

Also, I am not sure which operation you want to do, but if you make it vectorial (i.e. not using iterrows) this should handle the NaNs automatically.

  • Related