After mapping data in order to change to integer, variable only showing NaN value-CodePudding

So I was having trouble with a NaN error where asking for df['column'] was only showing NaN and I've narrowed it down to this specific part of the code and i think it has something to do with the way I have mapped the data. Does anyone have any idea?

My code is below:

df['country_code'] = df['country_code'].replace(['?'], ) - *there were some '?' values so I wanted to make this empty so that i could later replace with the mean once I'd converted everything to integer*
country_code_map = {'AUS': 1, 'USA': 2, 'CAN': 3, 'BGD': 4, 'BRZ': 5, 'JP': 6, 'ID': 7, 'HR': 8, 'CH': 9, 'FRA': 10, 'FIN': 11}
df['country_code'] = df['country_code'].map(country_code_map)
df['country_code'] = pd.to_numeric(df['country_code'])
df['country_code'] = df['country_code'].replace([''], df['country_code'].mean)

Let me know if any extra info req'd.

CodePudding user response：

I've created df['country_code'] in the following way, you should have something similar:

import pandas as pd

d = {'country_code': ["?", "BRZ", "USA"]}
df = pd.DataFrame(data=d)
print(df)

Output:

  country_code
0            ?
1          BRZ
2          USA

Now if I execute your code, this is what I get:

   country_code
0           NaN
1           5.0
2           2.0

You're getting a NaN value in the output instead of a mean over the column for the following reason.

Let's take a look at this line:

df['country_code'] = df['country_code'].replace(['?'], )
print(df)

Output:

   country_code
0           NaN
1           5.0
2           2.0

Here you're not erasing the ?s leaving the place empty, but you're filling it with NaN values.

So when you get to the last line, what you're trying to do is to replace empty strings '', but you have NaNs. What you should use instead is DataFrame.fillna, to fill the NaNs, like this:

df['country_code'] = df['country_code'].replace(['?'], ) 
country_code_map = {'AUS': 1, 'USA': 2, 'CAN': 3, 'BGD': 4, 'BRZ': 5, 'JP': 6, 'ID': 7, 'HR': 8, 'CH': 9, 'FRA': 10, 'FIN': 11}
df['country_code'] = df['country_code'].map(country_code_map)
df['country_code'] = pd.to_numeric(df['country_code'])
df['country_code'] = df['country_code'].fillna(df['country_code'].mean())

Output:

   country_code
0           3.5
1           5.0
2           2.0

CodePudding user response：

So I realised the issue was in my mapping and converting to an integer. It will automatically do this once I have mapped the data.

Therefore the code should look like this:

country_code_map = {'AUS': 1, 'USA': 2, 'CAN': 3, 'BGD': 4, 'BRZ': 5, 'JP': 6, 'ID': 7, 'HR': 8, 'CH': 9, 'FRA': 10, 'FIN': 11}
df['country_code'] = df['country_code'].map(country_code_map)

Then I can check the mean without getting the NaN values as I was before:

df['country_code'].mean)