Home > Net >  Pandas - map converting int to floats
Pandas - map converting int to floats

Time:04-14

I have a dictionary:

matches = {282: 285,
 266: 277,
 276: 293,
 263: 264,
 286: 280,
 356: 1371,
 373: 262,
 314: 327,
 294: 290,
 285: 282,
 277: 266,
 293: 276,
 264: 263,
 280: 286,
 1371: 356,
 262: 373,
 327: 314,
 290: 294}

And a df, like so:

    team_id 
0   327 
1   293 
2   373 
3   282 
4   314 
5   263 
6   280
7   354 
8   264 
9   294 
10  1371    
11  262 
12  266 
13  356 
14  290 
15  285 
16  286 
17  275 
18  277 
19  276 

Now I'm trying to create an 'adversary_id' column, mapped from the dict, like so:

df['adversary_id'] = df['team_id'].map(matches)

But this new column adversary_id is being converted to type float, and two rows are ending up with NaN.

Why, if all data is type int?

How do I fix this, that is, how do I avoid NaNs being generated and map one into the other with no errors?

CodePudding user response:

This is because the np.nan or NaN (they are not exact same) values you see in the dataframe are of type float. It is a limitation that pitifully can't be avoided as long as you have NaN values in your code.

Kindly read more in pandas' documentation here.

Because NaN is a float, a column of integers with even one missing values is cast to floating-point dtype (see Support for integer NA for more). pandas provides a nullable integer array, which can be used by explicitly requesting the dtype:

The proposed solution is to force the type with:

df['team_id'] = pd.Series(df['team_id'],dtype=pd.Int64Dtype())

Returning:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 1 columns):
 #   Column   Non-Null Count  Dtype
---  ------   --------------  -----
 0   Example  4 non-null      Int64
dtypes: Int64(1)
memory usage: 173.0 bytes

CodePudding user response:

If I understand correctly, you could just use python's built-in int() function.

  • Related