I am working with the pandas function, and I am trying to interpolate a missing value after removing a value that isn't numeric. However, I am still reading one na
value when calling the isna().sum()
function. A better explanation is below.
The input .csv file can be found here.
Here is what I have done:
#Import modules
import pandas as pd
import numpy as np
#Import data
df = pd.read_csv('example.csv')
df.isna().sum() #Shows no NA values, but I know that one of them is not numeric.
pd.to_numeric(df['example'])
The following error is produced, indicating the presence of an entry that needs to be removed at line number 949:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File ~libs\lib.pyx:2315, in pandas._libs.lib.maybe_convert_numeric()
ValueError: Unable to parse string "asdf"
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Input In [111], in <cell line: 3>()
1 df1 = pd.read_csv('example.csv')
2 df1.isna().sum()
----> 3 pd.to_numeric(df1['example'])
File ~numeric.py:184, in to_numeric(arg, errors, downcast)
182 coerce_numeric = errors not in ("ignore", "raise")
183 try:
--> 184 values, _ = lib.maybe_convert_numeric(
185 values, set(), coerce_numeric=coerce_numeric
186 )
187 except (ValueError, TypeError):
188 if errors == "raise":
File ~libs\lib.pyx:2357, in pandas._libs.lib.maybe_convert_numeric()
ValueError: Unable to parse string "asdf" at position 949
Here is my attempt to correct remove this value and interpolate a new one in its place:
idx_missing = df== 'asdf'
df[idx_missing] = np.nan
df['example'].isnull().sum() #This line confirms that there is one value missing
#Perform interpolation with a linear method
df1.iloc[:, -1] = df.iloc[:, -1].interpolate(method='linear') #Specifying the last column in the dataframe with the 'iloc' command
df1.isna().sum()
Apparently, there is still a missing value and the value was not interpolated:
example 1
dtype: int64
How can I correctly interpolate this value?
CodePudding user response:
If you first find and replace any value that is not a digit, that should fix your issue.
#Import modules
import pandas as pd
import numpy as np
#Import data
df = pd.read_csv('example.csv')
df['example'] = df.example.replace(r'[^\d]',np.nan,regex=True)
pd.to_numeric(df.example)