Home > Software design >  Pandas numerci column concider as string if NaN is inside
Pandas numerci column concider as string if NaN is inside

Time:11-28

I am starting to learn Python and I have an issue with pandas data frame. In R even if numeric columns have NaN values R manages to define the correct type of data in each column. In Pandas this does not seem to be the case:

data = {
"calories": ["NA", 380, 390],
"duration": [50, 40, 45]
}

df = pd.DataFrame(data)
df.dtypes

How can I manage to automatically detect the right type of data in each column?

Thanks in advance

CodePudding user response:

"NA" is a string, use np.nan or float('nan'):

data = {
"calories": [float('nan'), 380, 390],
"duration": [50, 40, 45]
}

df = pd.DataFrame(data)
print(df.dtypes)

calories    float64
duration      int64
dtype: object

Or:

import numpy as np
data = {
"calories": [np.nan, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data)

Note that if you use read_csv, pandas can infer NA values (by default, '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN', '<NA>', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null').

  • Related