Home > Mobile >  Pandas read columns from csv in a given data type with unknown column name
Pandas read columns from csv in a given data type with unknown column name

Time:07-13

I am trying to import a dataframe (df_model) from an excel file. The first column of this dataframe in excel file has integers 1,2,3,4,5 and I want to read them as integers instead of decimal or float values. But whenever, I try reading them through pandas, it converts the values in first column as decimal like 1.0,2.0,3.0,4.0,5.0. The values in rest of the columns however remain the way I want. Here is the dataframe that pandas read.

    Std S_Ultra S_Classic  ... SMV34_Ultra SMV34_Classic SMV34_Ultra for Flow
0    1.0      1A        1A  ...         1.0           1.0                  2.0
1    2.0      2A        2A  ...         2.0           2.0               2 SP=5
2    3.0      3A        3A  ...      2 SP=5        2 SP=5                  3.0
3    4.0      4A        4A  ...         3.0           3.0               3 SP=5
4    5.0      5A        5A  ...      3 SP=5        3 SP=5                  NaN
..   ...     ...       ...  ...         ...           ...                  ...
100  NaN     NaN       NaN  ...         NaN           NaN                  NaN

Is it possible that pandas doesnt convert the first column to decimal values by default?

CodePudding user response:

Yes, you can specify the type of the column while reading using pandas read_csv

df = pd.read_csv('filename.csv', dtype={'Std': 'Int32'})

And pandas will set the missing values as <NA>

EDIT : As discussed in the comments, the name of the columns are not known before hand, however what is known here is that first column or nth column will contain int, float, string data

While reading the data we can specify the column number and the data type. The column will be read in the datatype you specify. We will skip the header row and will read that separately and assign the header later.

0 is the first column number here

df = pd.read_csv(r'filename.csv', skiprows = 1,  dtype={'0': 'int'}, header = None)
headers = pd.read_csv(r"filename.csv", nrows=0).columns
df.columns = headers

The above code will give you the expected output

CodePudding user response:

With pandas read_excel() or read_csv() function, you can provide it the 'dtype' param, where you can specify the type you want any column to have, for example:

In your case, you can add that param like this:

df_model= pd.read_excel('filename.xlsx', dtype={'Std': int})
  • Related