Get non-float values from specific column in pandas dataframe-CodePudding

I want to get in a new dataframe the rows of an original dataframe where there is a non-real (i.e. string) value in a specific column.

import pandas as pd
import numpy as np
test = {'a':[1,2,3],
        'b':[4,5,'x'],
        'c':['f','g','h']}
df_test = pd.DataFrame(test)
print(df_test)

I want to get the third row where the value in 'b' column is not numeric (it is 'x').

CodePudding user response：

The complication is that Pandas forces column elements to have the same type (object for mixed str and int) so simple selection is not possible. Hence I think it is necessary to iterate over the column of interest to select the row(s) and then extract that/those.

mask = []
for j in df_test['b']:
    if isinstance(j, str):
        mask.append(True)
    else:
        mask.append(False)
        
print(df_test[mask])

which produces

   a  b  c
2  3  x  h

CodePudding user response：

You'll need to perform some type of list comprehension or element-wise apply and build a boolean mask for this type of problem. You can use any of the following approaches (you should see similar performance for all).

isinstance .apply

mask = df_test['b'].apply(isinstance, args=(str, ))

print(df_test.loc[mask])
   a  b  c
2  3  x  h

isinstance list comprehension

mask = [isinstance(v, str) for v in df_test['b']]

print(df_test.loc[mask])
   a  b  c
2  3  x  h

coerce to numeric and find nans

mask = pd.to_numeric(df_test['b'], errors='coerce').isna()

print(df_test.loc[mask])
   a  b  c
2  3  x  h