Pandas read_csv() can not read the string "null"-CodePudding

If I have this CSV :

"col1"
"hi"

it is read correctly using this code :

import pandas
df = pandas.read_csv("test.csv")
print(list(df["col1"]))

and prints :

['hi']

But if I change the string "hi" to "null" in the CSV , it fails !

It now prints

[nan]

My actual CSV is quite large and it so happened that it has this string "null" as a field value somewhere, and it cannot be read correctly it seems.

Any workarounds ?

CodePudding user response：

Update

using keep_default_na (see here) is the right way to go.

Clumsy Solution below

Using replace can do the job for you. Note that the current code replace all nan values across the df.

You can replace only is specific columns by using

df[['col1']] = df[['col1']].fillna('null')

import pandas as pd
import numpy as np

df = pd.read_csv("test.csv")
print('before:')
print(list(df["col1"]))

df = df.replace(np.nan, 'null', regex=True)
print('after:')
print(list(df["col1"]))

output

before:
[nan]
after:
['null']