I'm trying to pivot a dataframe but it keeps returning an Int64 Error. A similar question was not actually answered - What causes these Int64 columns to cause a TypeError?
Here's the type of my dataframe:
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 price 30159 non-null Int64
1 type 30159 non-null object
2 size 30155 non-null Int64
3 location 30159 non-null object
4 neighborhood 30159 non-null object
dtypes: Int64(2), object(3)
the pivot table code:
pfraw = pd.pivot_table(pfraw, values = 'price', index = 'neighborhood', columns = 'type')
and the lat bit of the error message:
273 dtype = np.dtype(dtype)
275 if not isinstance(dtype, np.dtype):
276 # enforce our signature annotation
--> 277 raise TypeError(dtype) # pragma: no cover
279 converted = maybe_downcast_numeric(result, dtype, do_round)
280 if converted is not result:
TypeError: Int64
I dont understand why would it return an error with Int64.
CodePudding user response:
First of all, let's create a df similar to the one OP has
import pandas as pd
df = pd.DataFrame( {'price': [10, 12, 18, 10, 12], 'type': ['A', 'A', 'A', 'B', 'B'], 'size': [10, 12, 18, 10, 12], 'location': ['A', 'A', 'A', 'B', 'B'], 'neighborhood': ['A', 'A', 'A', 'B', 'B']})
If one prints the df one will see that this one has int64
and not Int64
(as opposed to OP's). Note: On my answer here one finds the difference between the two dtypes.
print(df.info(verbose=True))
[Out]:
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 price 5 non-null int64
1 type 5 non-null object
2 size 5 non-null int64
3 location 5 non-null object
4 neighborhood 5 non-null object
And, with an int64
one will be able to create the pivot table with index "neighborhood", columns "type", and values "price", with the following
df_pivot = df.pivot_table(index='neighborhood', columns='type', values='price')
This is the output
type A B
neighborhood
A 13.333333 NaN
B NaN 11.0
However, with Int64
the Pivot Table can generate an error.
In order to handle that, one will need convert the type to int64
df[['price', 'size']] = df[['price', 'size']].astype('int64')
or
import numpy as np
df[['price', 'size']] = df[['price', 'size']].astype(np.int64)
Also, most likely, OP has missing values. The fastest way to handle that is to remove the rows with missing values. In order to find and remove the missing values, my answer here may be of help.
For the reference, this is a direct link to the module maybe_downcast_to_dtype that is raising the error that OP is having.