Home > Back-end >  Drop rows in a dataframe based on type of the entry
Drop rows in a dataframe based on type of the entry

Time:11-28

Suppose I have a dataframe x that has a column terms. Terms are supposed to be of type string, but some contain numbers and for this reason I want to delete the rows in the dataframe where the corresponding terms values are integers/floats. I tried the following but received a Key Error:

x = x.drop(x[type(x['terms']) is int].index) 

How should I change the code?

CodePudding user response:

Use pd.to_numeric:

df = pd.DataFrame({'terms': [13, 0.23, 'hello', 'world', '12', '0.45']})
df = df[pd.to_numeric(df['terms'], errors='coerce').isna()]
print(df)

# Output:
   terms
2  hello
3  world

Details:

>>> df
   terms
0     13
1   0.23
2  hello
3  world
4     12
5   0.45

>>> pd.to_numeric(df['terms'], errors='coerce')
0    13.00
1     0.23
2      NaN
3      NaN
4    12.00
5     0.45
Name: terms, dtype: float64

CodePudding user response:

Say you have a dataframe like this:

df = pd.DataFrame({'a':[1,'sd','sf',2,5,'13','s','143f','d234f','z24']})
#            notice 13 is a string here ^^^^

       a
0      1
1     sd
2     sf
3      2
4      5
5     13
6      s
7   143f
8  d234f
9    z24

If you want to get rid of items that look like numbers but are actually strings, use this:

df = df[~df['a'].astype(str).str.isdigit()]

Output:

>>> df
       a
1     sd
2     sf
6      s
7   143f
8  d234f
9    z24

If you want to get rid of items that are actually not strings at all, use this:

df = df[df['a'].transform(type).eq(str)]

Output:

>>> df
       a
1     sd
2     sf
5     13  <--- Notice how the string '13' is kept
6      s
7   143f
8  d234f
9    z24
  • Related