I have a dataframe in python, and I want to remove the rows that begins with "V10280", "V10281" or "V10282" in col "Variable".
variable |
---|
Capital |
RM_RIDE |
... |
V1028196 |
V1028197 |
V1028198 |
V1028199 |
V1028200 |
I was thinking of something like this
a = a.loc[a['variable'].str.startswith(("V10280", "V10281", "V10282"))]
but with "not" or "!", such as
a = a.loc[a['variable'].str.startswith(!("V10280", "V10281", "V10282"))]
but this doesn't work.
Thanks!!
CodePudding user response:
try:
a = a.loc[~a['variable'].str.startswith(("V10280", "V10281", "V10282"))]
CodePudding user response:
This should work:
import pandas as pd
a = pd.DataFrame({'variable' : 'Capital,RM_RIDE,V1028196,V1028197,V1028198,V1028199,V1028200'.split(',')})
a = a.loc[~a['variable'].str.startswith(("V10280", "V10281", "V10282"))]
print(a)
Output:
variable
0 Capital
1 RM_RIDE
Pandas uses the ~
operator as logical 'not' for element-wise operations.
Here what the docs say about boolean indexing:
Boolean indexing Another common operation is the use of boolean vectors to filter the data. The operators are: | for or, & for and, and ~ for not. These must be grouped by using parentheses, since by default Python will evaluate an expression such as df['A'] > 2 & df['B'] < 3 as df['A'] > (2 & df['B']) < 3, while the desired evaluation order is (df['A'] > 2) & (df['B'] < 3).
Using a boolean vector to index a Series works exactly as in a NumPy ndarray