Home > Software design >  Pandas converting datatypes depending on whether you get the row, then column or vice versa
Pandas converting datatypes depending on whether you get the row, then column or vice versa

Time:11-30

I just ran into this odd behavior in Pandas and was hoping someone could shed some light on it. I have a dataframe with two columns, one being of integer type and the other floating point. I then want to get the values from the first row, first column. There are two ways to do this, by either asking for the first row first via .iloc[0] and then getting the requested column ['X'], or by getting the column first, then the first row of that. The former (row first) will change the data type of the value from int to float (an undesired behavior), but column first preserves the data type. Is this a bug, or am I missing some nuance of Pandas?

Here's sample code to reproduce the behavior

import numpy as np
import pandas as pd
print(np.__version__, pd.__version__, '\n')

# Create two data sets, one that is integer and one that is floating point.

data1 = np.arange(1, 5, dtype=int)
data2 = np.arange(2, 6, dtype=float)
df = pd.DataFrame(data={'X': data1, 'Y': data2})

# Verify the data types in the dataframe

print(df.dtypes, '\n')

# Get the column first, then the row. We get the expected integer type

print('Column first: ', type(df['X'].iloc[0]))

# Get the row first, then the column. we get a float instead of integer.

print('Row first:    ', type(df.iloc[0]['X']))

with resulting output on my system of

1.21.2 1.3.4 

X      int32
Y    float64
dtype: object 

Column first:  <class 'numpy.int32'>
Row first:     <class 'numpy.float64'>

CodePudding user response:

A Series has a unique type, by slicing the row first, you return a Series of the row, and the values get typed as float as there is a float in the second column. The int gets upcasted as float. This type of behavior also happens when you insert a NaN in an int Series.

By slicing the column first, you keep the int type.

That said, there is a third way to get what you want: slice both at once

df.iloc[0,0]
  • Related