I have a dictionary:
matches = {282: 285,
266: 277,
276: 293,
263: 264,
286: 280,
356: 1371,
373: 262,
314: 327,
294: 290,
285: 282,
277: 266,
293: 276,
264: 263,
280: 286,
1371: 356,
262: 373,
327: 314,
290: 294}
And a df, like so:
team_id
0 327
1 293
2 373
3 282
4 314
5 263
6 280
7 354
8 264
9 294
10 1371
11 262
12 266
13 356
14 290
15 285
16 286
17 275
18 277
19 276
Now I'm trying to create an 'adversary_id' column, mapped from the dict, like so:
df['adversary_id'] = df['team_id'].map(matches)
But this new column adversary_id is being converted to type
float
, and two rows are ending up with NaN
:
team_id adversary_id
0 327 314.0
1 293 276.0
2 373 262.0
3 282 285.0
4 314 327.0
5 263 264.0
6 280 286.0
7 354 NaN
8 264 263.0
9 294 290.0
10 1371 356.0
11 262 373.0
12 266 277.0
13 356 1371.0
14 290 294.0
15 285 282.0
16 286 280.0
17 275 NaN
18 277 266.0
19 276 293.0
Why, if all data is type int?
How do I fix this, that is, how do I avoid NaNs being generated and map one into the other with no errors?
CodePudding user response:
This is because the np.nan or NaN (they are not exact same) values you see in the dataframe are of type float. It is a limitation that pitifully can't be avoided as long as you have NaN values in your code.
Kindly read more in pandas' documentation here.
Because NaN is a float, a column of integers with even one missing values is cast to floating-point dtype (see Support for integer NA for more). pandas provides a nullable integer array, which can be used by explicitly requesting the dtype:
The proposed solution is to force the type
with:
df['team_id'] = pd.Series(df['team_id'],dtype=pd.Int64Dtype())
Returning:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Example 4 non-null Int64
dtypes: Int64(1)
memory usage: 173.0 bytes
CodePudding user response:
If I understand correctly, you could just use python's built-in int() function.