I have this dataframe
Python 3.9.0 (v3.9.0:9cf6752276, Oct 5 2020, 11:29:23)
[Clang 6.0 (clang-600.0.57)] on darwin
>>> import pandas as pd
>>> import datetime as datetime
>>> pd.__version__
'1.3.5'
>>> dates = [datetime.datetime(2012, 2, 3) , datetime.datetime(2012, 2, 4)]
>>> x = pd.DataFrame({'Time': dates, 'Selected': [0, 0], 'Nr': [123.4, 25.2]})
>>> x.set_index('Time', inplace=True)
>>> x
Selected Nr
Time
2012-02-03 0 123.4
2012-02-04 0 25.2
An integer value from an integer column is converted to a float in the example but I do not see the reason for this conversion. In both cases I assume I pick the value from the 'Selected'
column from the first row. What is going on?
>>> x['Selected'].iloc[0]
0
>>> x.iloc[0]['Selected']
0.0
>>> x['Selected'].dtype
dtype('int64')
CodePudding user response:
x.iloc[0]
selects a single "row". A new series object is actually created. When it decides on the dtype of that row, a pd.Series
, it uses a floating point type, since that would not lose information in the "Nr"
column.
On the other hand, x['Selected'].iloc[0]
first selects a column, which will always preserve the dtype.
pandas
is fundamentally "column oriented". You can think of a dataframe as a dictionary of columns (it isn't, although I believe it used to essentially have that under the hood, but now it uses a more complex "block manager" approach, but these are internal implementation details)