I am starting to learn Python and I have an issue with pandas data frame. In R even if numeric columns have NaN values R manages to define the correct type of data in each column. In Pandas this does not seem to be the case:
data = {
"calories": ["NA", 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data)
df.dtypes
How can I manage to automatically detect the right type of data in each column?
Thanks in advance
CodePudding user response:
"NA" is a string, use np.nan
or float('nan')
:
data = {
"calories": [float('nan'), 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data)
print(df.dtypes)
calories float64
duration int64
dtype: object
Or:
import numpy as np
data = {
"calories": [np.nan, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data)
Note that if you use read_csv
, pandas can infer NA values (by default, '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN', '<NA>', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null'
).