Home > OS >  Filter rows in df that do not startswith pattern
Filter rows in df that do not startswith pattern

Time:04-17

I have a dataframe in python, and I want to remove the rows that begins with "V10280", "V10281" or "V10282" in col "Variable".

variable
Capital
RM_RIDE
...
V1028196
V1028197
V1028198
V1028199
V1028200

I was thinking of something like this

a = a.loc[a['variable'].str.startswith(("V10280", "V10281", "V10282"))]

but with "not" or "!", such as

a = a.loc[a['variable'].str.startswith(!("V10280", "V10281", "V10282"))]

but this doesn't work.

Thanks!!

CodePudding user response:

try:

a = a.loc[~a['variable'].str.startswith(("V10280", "V10281", "V10282"))]

CodePudding user response:

This should work:

import pandas as pd
a = pd.DataFrame({'variable' : 'Capital,RM_RIDE,V1028196,V1028197,V1028198,V1028199,V1028200'.split(',')})

a = a.loc[~a['variable'].str.startswith(("V10280", "V10281", "V10282"))]
print(a)

Output:

  variable
0  Capital
1  RM_RIDE

Pandas uses the ~ operator as logical 'not' for element-wise operations.

Here what the docs say about boolean indexing:

Boolean indexing Another common operation is the use of boolean vectors to filter the data. The operators are: | for or, & for and, and ~ for not. These must be grouped by using parentheses, since by default Python will evaluate an expression such as df['A'] > 2 & df['B'] < 3 as df['A'] > (2 & df['B']) < 3, while the desired evaluation order is (df['A'] > 2) & (df['B'] < 3).

Using a boolean vector to index a Series works exactly as in a NumPy ndarray

  • Related