I have a dataframe that looks like this:
data = {'Region': ['Africa','Africa','Africa','Africa','Africa','Africa','Africa','Africa','Asia','Asia','Asia','Asia'],
'Country': ['South Africa','South Africa','South Africa','South Africa','South Africa','South Africa','South Africa','South Africa','Japan','Japan','Japan','Japan'],
'Product': ['ABC','ABC','ABC','ABC','XYZ','XYZ','XYZ','XYZ','DEF','DEF','DEF','DEF'],
'Year': [2016, 2017, 2018, 2019,2016, 2017, 2018, 2019,2016, 2017, 2018, 2019],
'Price': [500, 400, 0,450,750,0,0,890,0,0,415,0],
'Quantity': [1200,1700,0,330,500,0,0,120,300,0,50,0],
'Value': [600000,680000,0,148500,350000,0,0,106800,0,0,20750,0]}
df = pd.dataframe(data)
I want to replace all the numeric values (i.e. those in columns Year, Price, Quantity, Value) with NaN but I can't figure out a good way to do it.
CodePudding user response:
Select numeric columns by DataFrame.select_dtypes
and set missing values:
df[df.select_dtypes(np.number).columns] = np.nan
Or if possible some rows has numeric or numeric saved like strings use to_numeric
for test them with DataFrame.where
for set NaN
s:
df = df.where(df.apply(pd.to_numeric, errors='coerce').isna())
print (df)
Region Country Product Year Price Quantity Value
0 Africa South Africa ABC NaN NaN NaN NaN
1 Africa South Africa ABC NaN NaN NaN NaN
2 Africa South Africa ABC NaN NaN NaN NaN
3 Africa South Africa ABC NaN NaN NaN NaN
4 Africa South Africa XYZ NaN NaN NaN NaN
5 Africa South Africa XYZ NaN NaN NaN NaN
6 Africa South Africa XYZ NaN NaN NaN NaN
7 Africa South Africa XYZ NaN NaN NaN NaN
8 Asia Japan DEF NaN NaN NaN NaN
9 Asia Japan DEF NaN NaN NaN NaN
10 Asia Japan DEF NaN NaN NaN NaN
11 Asia Japan DEF NaN NaN NaN NaN