Home > Net >  Get non-float values from specific column in pandas dataframe
Get non-float values from specific column in pandas dataframe

Time:12-03

I want to get in a new dataframe the rows of an original dataframe where there is a non-real (i.e. string) value in a specific column.

import pandas as pd
import numpy as np
test = {'a':[1,2,3],
        'b':[4,5,'x'],
        'c':['f','g','h']}
df_test = pd.DataFrame(test)
print(df_test)

I want to get the third row where the value in 'b' column is not numeric (it is 'x').

CodePudding user response:

The complication is that Pandas forces column elements to have the same type (object for mixed str and int) so simple selection is not possible. Hence I think it is necessary to iterate over the column of interest to select the row(s) and then extract that/those.

mask = []
for j in df_test['b']:
    if isinstance(j, str):
        mask.append(True)
    else:
        mask.append(False)
        
print(df_test[mask])

which produces

   a  b  c
2  3  x  h

CodePudding user response:

You'll need to perform some type of list comprehension or element-wise apply and build a boolean mask for this type of problem. You can use any of the following approaches (you should see similar performance for all).

isinstance .apply

mask = df_test['b'].apply(isinstance, args=(str, ))

print(df_test.loc[mask])
   a  b  c
2  3  x  h

isinstance list comprehension

mask = [isinstance(v, str) for v in df_test['b']]

print(df_test.loc[mask])
   a  b  c
2  3  x  h

coerce to numeric and find nans

mask = pd.to_numeric(df_test['b'], errors='coerce').isna()

print(df_test.loc[mask])
   a  b  c
2  3  x  h

  • Related