Home > Blockchain >  Drop a part of a dataframe
Drop a part of a dataframe

Time:10-01

I am quite stuck with a task I have to do. I have a dataframe and I want to delete specific row of this drataframe according to some conditions. when df[var1]>40 we note the value of df[var2] == x

I want to delete all the rows from the first row where df[var1]>40 included and df[var2] == x 1 excluded. Knowing that df[var2] will contain again this "x" value I want to keep. I just want to delete from df[var2] ==x to the first df[var2] == x 1

I am tearing my hairs finding some efficient code :(

I am using python. Where its bold I want to delete

var2 : [1,1,**1,1**,2,2,2, etc ...1,1,1,2,2,etc...]
var1 : [4,12,**350,4**,18,12,etc....12,14,etc...]

I wrote this kind of code to explain what I want to do:

for index, row in df.iterrows():
   if df.var1[index]>40:
       x = df.var2[index]
    while df.var[index] == x :
       df.drop(row)

CodePudding user response:

This should do the job:

df[df[var1] <= 40 | df[var2] != x 1]

Essentially, keeping the rows that have the reverse condition

CodePudding user response:

You can use this syntax to extract data with multiple conditions by adding "&" between condition

df[(df[var1] <= 40) & (df[var2] != x 1)]
# df[(condition_1) & (condition_2) &...]

CodePudding user response:

IIUC, suppose the following dataframe:

>>> df
    var2  var1
0      1     4
1      1    12
2      1    55  # <- (1) remove from here
3      1    21
4      1    32  # <- (1) until then
5      2    23
6      2    29
7      2    49  # <- (2) remove from here
8      2    72
9      2    23  # <- (2) until then
10     1    12
11     1    49  # <- (3) remove from here
12     1    23  # <- (3) until then

Since you have cyclic values for var2, you can't use it directly, you have to create dummy groups. For each new group, find var1 greater than 40 and apply cumulative sum. Only row where values equal 0 are kept.

>>> df[df.groupby(df['var2'].ne(df['var2'].shift()).cumsum())['var1']
         .apply(lambda x: x.gt(40).cumsum()).eq(0)]

    var2  var1
0      1     4
1      1    12
5      2    23
6      2    29
10     1    12
  • Related