I add a calculated column c
to a DataFrame that only contains integers.
df = pd.DataFrame(data=list(zip(*[np.random.randint(1,3,5), np.random.random(5)])), columns=['a', 'b'])
df['c'] = np.ceil(df.a/df.b).astype(int)
df.dtypes
The DataFrame reports that the column type of c
is indeed int
:
a int64
b float64
c int32
dtype: object
If I access a value from c
like this then I get an int:
df.c.values[0] # Returns "3"
type(df.c.values[0]) # Returns "numpy.int32"
But if I access the same value using loc
I get a float:
df.iloc[0].c # Returns "3.0"
type(df.iloc[0].c) # Returns "numpy.float64"
Why is this?
I would like to be able to access the value using indexes without having to cast it (again) to an int.
CodePudding user response:
Looks like what's happening is when you are accessing df.iloc[0].c
, you have to first access df.iloc[0]
which includes all three columns. df.iloc[0]
then casts to the type that represents all three columns, which is numpy.float64
.
Interestingly enough, I can avoid this by adding a string column.
df = pd.DataFrame(data=list(zip(*[np.random.randint(1,3,5), np.random.random(5)])), columns=['a', 'b'])
df['c'] = np.ceil(df.a/df.b).astype(int)
df['d'] = ['hi', 'bye', 'hello', 'cya', 'sup']
print(df.iloc[0].c)
print(type(df.iloc[0].c))
print(df.dtypes)
To your end question, you can avoid this whole mess by using df.loc[0, 'c']
instead of iloc
.
CodePudding user response:
- When I execute your code, result is this dataframe :
df
a b c
0 1 0.315388 4
1 1 0.111275 9
2 1 0.251253 4
3 2 0.043162 47
4 1 0.047985 21
- When I type in the interpreter
df['c'].values
I get this :array([ 4, 9, 4, 47, 21])
It's to say all the c-column values
- When I type in the interpreter
df.iloc[0]
I have the following values :
a 1.000000
b 0.315388
c 4.000000
Name: 0, dtype: float64
it's to say the first df row values.
What we could notice
All c-column values are integers while all first row values are not of the same types because we have then two integers and a float value. This fact is very important.
Indeed by definition an array is a collection of elements of the same type
So to represent a float in a collection of values that are integers, conversion must to be float for all elements to respect this rule, because floats can contains integers but the reverse is not true.
Conclusion
Type of a collection of integers is int...
Type of a collection of floats is float...
Type of a collection of integers containing at least one float is converted to float...
Quote
"An array is a concept that stores different items of the same type together as one and makes calculating the stance of each element easier by adding an offset to the base number." (codeinstitute.net)