Home > Back-end >  Numpy Select with Multiple Conditions Not Returning Values
Numpy Select with Multiple Conditions Not Returning Values

Time:03-15

I combined three dataframes that have multiple email columns, then renamed the columns for ease of scripting. I'm trying to create a master email column based on conditions. If C is populated, use C. If not, and B is populated, use B. If not, and A is populated, use A. For some reason it's only filling in the master column if C is populated. Even if A is populated, it's returning a null value.

dfs = [df_A, d365, df_C]
from functools import reduce
dfFinal = reduce(lambda left,right: pd.merge(left,right,on='leadId'), dfs)

import numpy as np
emailconditions = [
dfFinal['EmailC'] is not None,
(dfFinal['EmailC'] is None) & (dfFinal['EmailB'] is not None),
(dfFinal['EmailC'] is None) & (dfFinal['EmailB'] is None)]

emailvalues = [
dfFinal['EmailC'],
dfFinal['EmailB'],
dfFinal['EmailA']]

dfFinal['emailFinal'] = np.select(emailconditions, emailvalues)

CodePudding user response:

Try setting emailconditions like so:

emailconditions = [
dfFinal['EmailC'].notna(),
dfFinal['EmailC'].isna() & dfFinal['EmailB'].notna(),
dfFinal['EmailC'].isna() & dfFinal['EmailB'].isna()]

Key point is to use notna() instead of is not None and isna() instead of is None.

CodePudding user response:

Here's an alternative solution, which just grabs the first truthy value out of the columns in whichever order you give priority:

In [3]: df
Out[3]:
      a     b     c
0     x     w  None
1  None     y  None
2     k  None     z

In [4]: order = ("c", "b", "a")

In [5]: df.apply(lambda row: next(row[col] for col in order if row[col]), axis=1)
Out[5]:
0    w
1    y
2    z
dtype: object

If you anticipate having rows where none of the columns have a value, then you'd probably want something like this:

def first_truthy(row, order):
    try:
        return next(row[col] for col in order if row[col])
    except StopIteration:
        return None

Output:

In [7]: df
Out[7]:
      a     b     c
0     x     w  None
1  None     y  None
2     k  None     z
3  None  None  None


In [8]: df.apply(lambda row: first_truthy(row, order), axis=1)
Out[8]:
0       w
1       y
2       z
3    None
dtype: object

Not that this is likely much slower than boolean masking, but (in my opinion) easier to reason about and debug, and doesn't require the extra dependency on numpy. If you need performance, @richardec's solution is likely vastly superior, though I have not benchmarked our solutions.

  • Related