I am trying to import a dataframe (df_model) from an excel file. The first column of this dataframe in excel file has integers 1,2,3,4,5 and I want to read them as integers instead of decimal or float values. But whenever, I try reading them through pandas, it converts the values in first column as decimal like 1.0,2.0,3.0,4.0,5.0. The values in rest of the columns however remain the way I want. Here is the dataframe that pandas read.
Std S_Ultra S_Classic ... SMV34_Ultra SMV34_Classic SMV34_Ultra for Flow
0 1.0 1A 1A ... 1.0 1.0 2.0
1 2.0 2A 2A ... 2.0 2.0 2 SP=5
2 3.0 3A 3A ... 2 SP=5 2 SP=5 3.0
3 4.0 4A 4A ... 3.0 3.0 3 SP=5
4 5.0 5A 5A ... 3 SP=5 3 SP=5 NaN
.. ... ... ... ... ... ... ...
100 NaN NaN NaN ... NaN NaN NaN
Is it possible that pandas doesnt convert the first column to decimal values by default?
CodePudding user response:
Yes, you can specify the type of the column while reading using pandas read_csv
df = pd.read_csv('filename.csv', dtype={'Std': 'Int32'})
And pandas will set the missing values as <NA>
EDIT :
As discussed in the comments, the name of the columns are not known before hand, however what is known here is that first column or nth column will contain int, float, string
data
While reading the data we can specify the column number and the data type. The column will be read in the datatype you specify. We will skip the header row and will read that separately and assign the header later.
0
is the first column number here
df = pd.read_csv(r'filename.csv', skiprows = 1, dtype={'0': 'int'}, header = None)
headers = pd.read_csv(r"filename.csv", nrows=0).columns
df.columns = headers
The above code will give you the expected output
CodePudding user response:
With pandas read_excel() or read_csv() function, you can provide it the 'dtype' param, where you can specify the type you want any column to have, for example:
In your case, you can add that param like this:
df_model= pd.read_excel('filename.xlsx', dtype={'Std': int})