Pandas drop multiple in range value using isin-CodePudding

Given a df

I would like to drop rows if the value in column a is not within these multiple range (10-15),(25-30),(50-55), (80-85). Such that these range are made from the 'lbotandltop`

lbot =[10, 25, 50, 80]
ltop=[15, 30, 55, 85]

I am thinking this can be achieve via pandas isin

df[df['a'].isin(list(zip(lbot,ltop)))]

But, it return empty df instead.

The expected output is

CodePudding user response：

You can use numpy broadcasting to create a boolean mask where for each row it returns True if the value is within any of the ranges and filter df with it.:

out = df[((df[['a']].to_numpy() >=lbot) & (df[['a']].to_numpy() <=ltop)).any(axis=1)]

Output:

CodePudding user response：

Create values in flatten list comprehension with range:

df = df[df['a'].isin([z for x, y in zip(lbot,ltop) for z in range(x, y 1)])]
print (df)
     a
4   10
5   11
8   26
9   51
11  83
12  82
13  85

Or use np.concatenate for flatten list of ranges:

df = df[df['a'].isin(np.concatenate([range(x, y 1) for x, y in zip(lbot,ltop)]))]

CodePudding user response：

A method that uses between():

df[pd.concat([df['a'].between(x, y) for x,y in zip(lbot, ltop)], axis=1).any(axis=1)]

output:

CodePudding user response：

If your values in the two lists are sorted, a method that doesn't require any loop would be to use pandas.cut and checking that you obtain the same group cutting on the two lists:

# group based on lower bound
id1 = pd.cut(df['a'], bins=lbot [float('inf')], labels=range(len(lbot)),
             right=False) # include lower bound
# group based on upper bound
id2 = pd.cut(df['a'], bins=[0] ltop, labels=range(len(ltop)))

# ensure groups are identical
df[id1.eq(id2)]

output:

intermediate groups:

     a  id1  id2
0    1  NaN    0
1    2  NaN    0
2    1  NaN    0
3    7  NaN    0
4   10    0    0
5   11    0    0
6   21    0    1
7   22    0    1
8   26    1    1
9   51    2    2
10  56    2    3
11  83    3    3
12  82    3    3
13  85    3    3
14  90    3  NaN