I have a dictionary:
matches = {282: 285,
266: 277,
276: 293,
263: 264,
286: 280,
356: 1371,
373: 262,
314: 327,
294: 290,
285: 282,
277: 266,
293: 276,
264: 263,
280: 286,
1371: 356,
262: 373,
327: 314,
290: 294}
And a df, like so:
team_id
0 327
1 293
2 373
3 282
4 314
5 263
6 280
7 354
8 264
9 294
10 1371
11 262
12 266
13 356
14 290
15 285
16 286
17 275
18 277
19 276
Now I'm trying to create an 'adversary_id' column, mapped from the dict, like so:
df['adversary_id'] = df['team_id'].map(matches)
But this new column adversary_id is being converted to type
float
, and two rows are ending up with NaN
.
Why, if all data is type int?
How do I fix this, that is, how do I avoid NaNs being generated and map one into the other with no errors?
CodePudding user response:
This is because the np.nan or NaN (they are not exact same) values you see in the dataframe are of type float. It is a limitation that pitifully can't be avoided as long as you have NaN values in your code.
Kindly read more in pandas' documentation here.
Because NaN is a float, a column of integers with even one missing values is cast to floating-point dtype (see Support for integer NA for more). pandas provides a nullable integer array, which can be used by explicitly requesting the dtype:
The proposed solution is to force the type
with:
df['team_id'] = pd.Series(df['team_id'],dtype=pd.Int64Dtype())
Returning:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Example 4 non-null Int64
dtypes: Int64(1)
memory usage: 173.0 bytes
CodePudding user response:
If I understand correctly, you could just use python's built-in int() function.