Pandas - how to avoid map converting int to floats-CodePudding

I have a dictionary:

matches = {282: 285,
 266: 277,
 276: 293,
 263: 264,
 286: 280,
 356: 1371,
 373: 262,
 314: 327,
 294: 290,
 285: 282,
 277: 266,
 293: 276,
 264: 263,
 280: 286,
 1371: 356,
 262: 373,
 327: 314,
 290: 294}

And a df, like so:

Now I'm trying to create an 'adversary_id' column, mapped from the dict, like so:

df['adversary_id'] = df['team_id'].map(matches)

But this new column adversary_id is being converted to type float, and two rows are ending up with NaN:

team_id adversary_id
0   327 314.0
1   293 276.0
2   373 262.0
3   282 285.0
4   314 327.0
5   263 264.0
6   280 286.0
7   354 NaN
8   264 263.0
9   294 290.0
10  1371 356.0
11  262 373.0
12  266 277.0
13  356 1371.0
14  290 294.0
15  285 282.0
16  286 280.0
17  275 NaN
18  277 266.0
19  276 293.0

Why, if all data is type int?

How do I fix this, that is, how do I avoid NaNs being generated and map one into the other with no errors?

CodePudding user response：

This is because the np.nan or NaN (they are not exact same) values you see in the dataframe are of type float. It is a limitation that pitifully can't be avoided as long as you have NaN values in your code.

Kindly read more in pandas' documentation here.

Because NaN is a float, a column of integers with even one missing values is cast to floating-point dtype (see Support for integer NA for more). pandas provides a nullable integer array, which can be used by explicitly requesting the dtype:

The proposed solution is to force the type with:

df['team_id'] = pd.Series(df['team_id'],dtype=pd.Int64Dtype())

Returning:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 1 columns):
 #   Column   Non-Null Count  Dtype
---  ------   --------------  -----
 0   Example  4 non-null      Int64
dtypes: Int64(1)
memory usage: 173.0 bytes

CodePudding user response：

If I understand correctly, you could just use python's built-in int() function.