Home > Enterprise >  Drop rows containing string Pandas
Drop rows containing string Pandas

Time:08-11

I am trying to remove rows with an specific string only on a column, in a dataframe.

I tought of using a combination of drop and iloc methods, because the column names are rather large and mutable and I am not interested in referencing the columns by name, but I am not being able to combine those two into a function containing the string parameter.

As an example, let's say I have the following dataframe:

    Nome    Nota
0   a   1.000000
1   b   1.250000
2   c   1.375000
3   d   1.437500
4   e   1.468750
5   f   1.484375
6   g   1.492188
7   h   1.496094
8   i   1.498047
9   j   1.499023
10  k   1.499512
11  l   1.499756
12  m   1.499878
13  n   1.499939
14  o   1.499969
15  p   1.499985
16  q   1.499992
17  r   1.499996
18  s   1.499998

Let's say I would like to drop every row containing the 'm' string on the first column. I tried using the function:

testdf.drop(testdf.columns[0] == 'm',inplace = True)

but it gave me the error message:

'KeyError: '[False] not found in axis'.

What am I getting wrong here?

CodePudding user response:

Use Boolean indexing

first_col = testdf.columns[0]; 
testdf = testdf[~(testdf[first_col]=='m')]

CodePudding user response:

In this case, testdf.columns[0] == "m" is returning a list of truth values that correspond to whether or not each row in column 0 is equal to "m". What you want to do instead is use this list of truth values as an index into the DataFrame. You can do so using this line of code.

testdf = testdf[testdf["Nome"] == "m"]

Hope this helps.

CodePudding user response:

Try this :

import pandas as pd
df = pd.DataFrame({'Nome' : ['a','m','c','m'],
                   'Nota' : [1.0, 1.1, 1.2, 1.3]})

df.loc[df['Nome'] != 'm'].reset_index(drop = True)

CodePudding user response:

You could specify a filter like this:

filter = df['Nome'] != 'm'

This will output an array of Boolean, note that the index 12 is False

0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8      True
9      True
10     True
11     True
12    False
13     True
14     True
15     True
16     True
17     True
18     True
Name: Nome, dtype: bool

After that apply the filter to the dataframe, and index 12 will be removed

df = df[filter]
print(df)

   Nome      Nota
0     a  1.000000
1     b  1.250000
2     c  1.375000
3     d  1.437500
4     e  1.468750
5     f  1.484375
6     g  1.492188
7     h  1.496094
8     i  1.498047
9     j  1.499023
10    k  1.499512
11    l  1.499756
13    n  1.499939
14    o  1.499969
15    p  1.499985
16    q  1.499992
17    r  1.499996
18    s  1.499998
  • Related