Difference between .values and .iloc on a pandas series-CodePudding

I am currently refactoring some code where I see both these lines being used :

foo = df['bar'].values[0]
foo = df['bar'].iloc[0]

From my current understanding, both lines do the same thing: retrieving the first value of the pandas series.

Are they really the same? If yes, is one way more recommended than the other? (due to internals subtleties, speed, behavior when setting value instead of getting value, etc)

CodePudding user response：

I think most time it is same output, if dont use datetimes, because .values or Series.to_numpy return first value of numpy array:

df = pd.DataFrame({'bar':pd.date_range('2001', freq='Q', periods=5)})
print (df)
         bar
0 2001-03-31
1 2001-06-30
2 2001-09-30
3 2001-12-31
4 2002-03-31

foo = df['bar'].to_numpy()[0]
print(foo)
2001-03-31T00:00:00.000000000

print(type(foo))
<class 'numpy.datetime64'>

foo = df['bar'].values[0]
print(foo)
2001-03-31T00:00:00.000000000

print(type(foo))
<class 'numpy.datetime64'>

foo = df['bar'].iloc[0]
print(foo)
2001-03-31 00:00:00

print(type(foo))
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

CodePudding user response：

The code df.values actually returns a numpy.array (i.e. it can be used without square brackets).

df[col].values
df[col].values[0] # 1st element of numpy array
df[col].values[1:3] # 2nd and 3rd element of numpy array

Meanwhile df.iloc is a position based indexing to get elements from a dataframe. iloc must be used with square brackets otherwise you'll see an error.

df.iloc # Error
df.iloc[row, col] # Returns a cell, array (`Series`), matrix (`DataFrame`) based on input

The subtle difference lies in the object being returned, and also the implementation behind the scenes.

iloc directly reads data from memory and returns the output.

values converts a DataFrame into a numpy.array object and then reads data from memory and returns the output (hence iloc is faster).