Given a df
a
0 1
1 2
2 1
3 7
4 10
5 11
6 21
7 22
8 26
9 51
10 56
11 83
12 82
13 85
14 90
I would like to drop rows if the value in column a
is not within these multiple range
(10-15),(25-30),(50-55), (80-85). Such that these range are made from the 'lbotand
ltop`
lbot =[10, 25, 50, 80]
ltop=[15, 30, 55, 85]
I am thinking this can be achieve via pandas isin
df[df['a'].isin(list(zip(lbot,ltop)))]
But, it return empty df
instead.
The expected output is
a
10
11
26
51
83
82
85
CodePudding user response:
You can use numpy broadcasting to create a boolean mask where for each row it returns True if the value is within any of the ranges and filter df
with it.:
out = df[((df[['a']].to_numpy() >=lbot) & (df[['a']].to_numpy() <=ltop)).any(axis=1)]
Output:
a
4 10
5 11
8 26
9 51
11 83
12 82
13 85
CodePudding user response:
Create values in flatten list comprehension with range
:
df = df[df['a'].isin([z for x, y in zip(lbot,ltop) for z in range(x, y 1)])]
print (df)
a
4 10
5 11
8 26
9 51
11 83
12 82
13 85
Or use np.concatenate
for flatten list of ranges:
df = df[df['a'].isin(np.concatenate([range(x, y 1) for x, y in zip(lbot,ltop)]))]
CodePudding user response:
A method that uses between()
:
df[pd.concat([df['a'].between(x, y) for x,y in zip(lbot, ltop)], axis=1).any(axis=1)]
output:
a
4 10
5 11
8 26
9 51
11 83
12 82
13 85
CodePudding user response:
If your values in the two lists are sorted, a method that doesn't require any loop would be to use pandas.cut
and checking that you obtain the same group cutting on the two lists:
# group based on lower bound
id1 = pd.cut(df['a'], bins=lbot [float('inf')], labels=range(len(lbot)),
right=False) # include lower bound
# group based on upper bound
id2 = pd.cut(df['a'], bins=[0] ltop, labels=range(len(ltop)))
# ensure groups are identical
df[id1.eq(id2)]
output:
a
4 10
5 11
8 26
9 51
11 83
12 82
13 85
intermediate groups:
a id1 id2
0 1 NaN 0
1 2 NaN 0
2 1 NaN 0
3 7 NaN 0
4 10 0 0
5 11 0 0
6 21 0 1
7 22 0 1
8 26 1 1
9 51 2 2
10 56 2 3
11 83 3 3
12 82 3 3
13 85 3 3
14 90 3 NaN