I am trying to remove rows with an specific string only on a column, in a dataframe.
I tought of using a combination of drop and iloc methods, because the column names are rather large and mutable and I am not interested in referencing the columns by name, but I am not being able to combine those two into a function containing the string parameter.
As an example, let's say I have the following dataframe:
Nome Nota
0 a 1.000000
1 b 1.250000
2 c 1.375000
3 d 1.437500
4 e 1.468750
5 f 1.484375
6 g 1.492188
7 h 1.496094
8 i 1.498047
9 j 1.499023
10 k 1.499512
11 l 1.499756
12 m 1.499878
13 n 1.499939
14 o 1.499969
15 p 1.499985
16 q 1.499992
17 r 1.499996
18 s 1.499998
Let's say I would like to drop every row containing the 'm' string on the first column. I tried using the function:
testdf.drop(testdf.columns[0] == 'm',inplace = True)
but it gave me the error message:
'KeyError: '[False] not found in axis'.
What am I getting wrong here?
CodePudding user response:
Use Boolean indexing
first_col = testdf.columns[0];
testdf = testdf[~(testdf[first_col]=='m')]
CodePudding user response:
In this case, testdf.columns[0] == "m" is returning a list of truth values that correspond to whether or not each row in column 0 is equal to "m". What you want to do instead is use this list of truth values as an index into the DataFrame. You can do so using this line of code.
testdf = testdf[testdf["Nome"] == "m"]
Hope this helps.
CodePudding user response:
Try this :
import pandas as pd
df = pd.DataFrame({'Nome' : ['a','m','c','m'],
'Nota' : [1.0, 1.1, 1.2, 1.3]})
df.loc[df['Nome'] != 'm'].reset_index(drop = True)
CodePudding user response:
You could specify a filter like this:
filter = df['Nome'] != 'm'
This will output an array of Boolean, note that the index 12 is False
0 True
1 True
2 True
3 True
4 True
5 True
6 True
7 True
8 True
9 True
10 True
11 True
12 False
13 True
14 True
15 True
16 True
17 True
18 True
Name: Nome, dtype: bool
After that apply the filter to the dataframe, and index 12 will be removed
df = df[filter]
print(df)
Nome Nota
0 a 1.000000
1 b 1.250000
2 c 1.375000
3 d 1.437500
4 e 1.468750
5 f 1.484375
6 g 1.492188
7 h 1.496094
8 i 1.498047
9 j 1.499023
10 k 1.499512
11 l 1.499756
13 n 1.499939
14 o 1.499969
15 p 1.499985
16 q 1.499992
17 r 1.499996
18 s 1.499998