Home > Net >  Can I use np.where to determine if a number is within a range of 2 numbers found in a list in anothe
Can I use np.where to determine if a number is within a range of 2 numbers found in a list in anothe

Time:01-10

I'm trying to determine if a number in 1 column (COL A) is within a range of 2 numbers found in a nested list in another column (COL B). COL B can have several nested lists of numbers.

example dataframe

This is being done in Python and pandas dataframes. I'm confident I can achieve this by using for loops but I'd like to utilize a vectorized solution. My attempts at using np.where include some variation of this (I've only gone so far as to address situations where column B only has 1 embedded list of numbers. I'm guessing I could use an embedded np.where statement in the False parameter...):

test_df['inRange'] = np.where(np.isin(test_df['COL A'],list(range(test_df['COL B'][0][0],test_df['COL B'][0][1]))), 'match', 'no match')

However, I keep getting an Index Error: list index out of range I speculate I'm not using the correct syntax to refer to embedded lists from another column when using np.where.

Any insight into how to achieve what I'm attempting is appreciated. Thank you!

CodePudding user response:

You can definitely do it without loops, but I don't think it'll be much better in terms of performance when compared to a looping approach (not a whole lot of truly vectorized operations; This solution is still just looping over the rows of the dataframe and the intervals in the 2d lists, pandas just does it under the hood).

Here the trick is to use a pd.IntervalIndex, but other things are possible too.

import numpy as np
import pandas as pd

def check_in_intervals(row):
    val, intervals = row
    idx = pd.IntervalIndex.from_tuples(
        list(map(tuple, intervals)),
        closed="left"
    )
    return idx.contains(val).any()

df = pd.DataFrame({
    "COL A": [19, 76, 98, 103],
    "COL B": [[[19, 38]], [[76, 84]], [[98, 129]], [[11, 23], [54, 333]]]
})

df["inRange"] = np.where(df.apply(check_in_intervals, axis=1), 'match', 'no match')

df:

   COL A                  COL B inRange
0     19             [[19, 38]]   match
1     76             [[76, 84]]   match
2     98            [[98, 129]]   match
3    103  [[11, 23], [54, 333]]   match
  • Related