I am quite stuck with a task I have to do. I have a dataframe and I want to delete specific row of this drataframe according to some conditions. when df[var1]>40 we note the value of df[var2] == x
I want to delete all the rows from the first row where df[var1]>40 included and df[var2] == x 1 excluded. Knowing that df[var2] will contain again this "x" value I want to keep. I just want to delete from df[var2] ==x to the first df[var2] == x 1
I am tearing my hairs finding some efficient code :(
I am using python. Where its bold I want to delete
var2 : [1,1,**1,1**,2,2,2, etc ...1,1,1,2,2,etc...]
var1 : [4,12,**350,4**,18,12,etc....12,14,etc...]
I wrote this kind of code to explain what I want to do:
for index, row in df.iterrows():
if df.var1[index]>40:
x = df.var2[index]
while df.var[index] == x :
df.drop(row)
CodePudding user response:
This should do the job:
df[df[var1] <= 40 | df[var2] != x 1]
Essentially, keeping the rows that have the reverse condition
CodePudding user response:
You can use this syntax to extract data with multiple conditions by adding "&" between condition
df[(df[var1] <= 40) & (df[var2] != x 1)]
# df[(condition_1) & (condition_2) &...]
CodePudding user response:
IIUC, suppose the following dataframe:
>>> df
var2 var1
0 1 4
1 1 12
2 1 55 # <- (1) remove from here
3 1 21
4 1 32 # <- (1) until then
5 2 23
6 2 29
7 2 49 # <- (2) remove from here
8 2 72
9 2 23 # <- (2) until then
10 1 12
11 1 49 # <- (3) remove from here
12 1 23 # <- (3) until then
Since you have cyclic values for var2
, you can't use it directly, you have to create dummy groups. For each new group, find var1
greater than 40 and apply cumulative sum. Only row where values equal 0 are kept.
>>> df[df.groupby(df['var2'].ne(df['var2'].shift()).cumsum())['var1']
.apply(lambda x: x.gt(40).cumsum()).eq(0)]
var2 var1
0 1 4
1 1 12
5 2 23
6 2 29
10 1 12