Home > Software design >  Pandas interpolation function fails to interpolate after replacing values with .nan
Pandas interpolation function fails to interpolate after replacing values with .nan

Time:04-01

I am working with the pandas function, and I am trying to interpolate a missing value after removing a value that isn't numeric. However, I am still reading one na value when calling the isna().sum() function. A better explanation is below.

The input .csv file can be found here.

Here is what I have done:

#Import modules
import pandas as pd
import numpy as np

#Import data
df = pd.read_csv('example.csv')

df.isna().sum() #Shows no NA values, but I know that one of them is not numeric.

pd.to_numeric(df['example'])

The following error is produced, indicating the presence of an entry that needs to be removed at line number 949:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~libs\lib.pyx:2315, in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string "asdf"

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Input In [111], in <cell line: 3>()
      1 df1 = pd.read_csv('example.csv')
      2 df1.isna().sum()
----> 3 pd.to_numeric(df1['example'])

File ~numeric.py:184, in to_numeric(arg, errors, downcast)
    182 coerce_numeric = errors not in ("ignore", "raise")
    183 try:
--> 184     values, _ = lib.maybe_convert_numeric(
    185         values, set(), coerce_numeric=coerce_numeric
    186     )
    187 except (ValueError, TypeError):
    188     if errors == "raise":

File ~libs\lib.pyx:2357, in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string "asdf" at position 949

Here is my attempt to correct remove this value and interpolate a new one in its place:

idx_missing = df== 'asdf'
df[idx_missing] = np.nan
df['example'].isnull().sum() #This line confirms that there is one value missing

#Perform interpolation with a linear method
df1.iloc[:, -1] = df.iloc[:, -1].interpolate(method='linear') #Specifying the last column in the dataframe with the 'iloc' command
df1.isna().sum()

Apparently, there is still a missing value and the value was not interpolated:

example    1
dtype: int64

How can I correctly interpolate this value?

CodePudding user response:

If you first find and replace any value that is not a digit, that should fix your issue.

#Import modules
import pandas as pd
import numpy as np

#Import data
df = pd.read_csv('example.csv')

df['example'] = df.example.replace(r'[^\d]',np.nan,regex=True)
pd.to_numeric(df.example)
  • Related