pandas nested IFs and logic does not get the correct results-CodePudding

Not sure what is the issue in my flow:

This is sample df:

df = pd.DataFrame({'customer':['A','B','C','D','E','F'],
                   'Traveled':[1,1,1,0,1,0],
                   'Travel_count':[2,3,5,0,1,0],
                   'country1':['UK','Italy','CA', '0','UK','0'],
                   'country2':['JP','IN','CO','0','EG','0'],
                   'shopping':['High','High','High','High','Medium','Medium']
                   })

gives:

  customer  Traveled  Travel_count country1 country2 shopping
0        A         1             2       UK       JP     High
1        B         1             3    Italy       IN     High
2        C         1             5       CA       CO     High
3        D         0             0        0        0     High
4        E         1             1       UK       EG   Medium
5        F         0             0        0        0   Medium

I want to create some functions that automate filtering and then create a custom made df, so here are two functions that check customers on columns: Traveled == 1 and shopping == High :

def travel():
    if (df['Traveled'] == 1):
        return True
    else:
        return False

def shop_high():
    if (df['shopping'] == 'High'):
        return True
    else:
        return False

Here is a nested ifs code if the above conditions are True, it will check those who traveled more or less than 3 times:

def select(df):
    if(travel and shop_high):
        if (df['Travel_count'] > 3):
            return (df['customer'], df['shopping'], ('Customer {} traveled more than 3 times').format(df['customer']))
        elif (df['Travel_count'] < 3):
            return (df['customer'], df['shopping'], ('Customer {} traveled less than 3 times').format(df['customer']))

If I apply this function to the original df to automate filtration and checking travel count, I got wrong results:

pd.DataFrame(list(df.apply(select, axis = 1).dropna()))

results:

   0       1                                      2
0  A    High  Customer A traveled less than 3 times
1  C    High  Customer C traveled more than 3 times
2  D    High  Customer D traveled less than 3 times
3  E  Medium  Customer E traveled less than 3 times
4  F  Medium  Customer F traveled less than 3 times

Should be:

   0       1                                      2
0  A    High  Customer A traveled less than 3 times
1  C    High  Customer C traveled more than 3 times

CodePudding user response：

I would use boolean indexing and numpy.sign:

import numpy as np

travel = (np.sign(df['Travel_count'].sub(3))
            .map({1: ' traveled more than 3 times',
                  -1: ' traveled less than 3 times'})
         )

m1 = df['Traveled'].eq(1)
m2 = df['shopping'].eq('High')
m3 = travel.notna()

out = (df.loc[m1&m2&m3, ['customer', 'shopping']]
         .assign(new='Customer ' df['customer'] travel)
      )

output:

  customer shopping                                    new
0        A     High  Customer A traveled less than 3 times
2        C     High  Customer C traveled more than 3 times

CodePudding user response：

Using isin:

new_df = ( df[df[['Traveled', 'shopping']].isin(['High', 1]).all(axis=1) 
               & df['Travel_count'].ne(3)].reset_index(drop=True))
new_df['new'] = ('Customer '   new_df['customer']     ' traveled '    
            pd.Series(np.where(new_df['Travel_count'].lt(3), 'less', 'more'))  
           ' than 3 times')

CodePudding user response：

You can filter the dataframe by the 3 conditions and apply a simple function for the print

des = lambda row: f'Customer {row["customer"]} traveled {"more" if row["Travel_count"] > 3 else "less"} than 3 times'

df = df.loc[(df['Traveled'] == 1) & (df['shopping'] == 'High') & (df['Travel_count'] != 3)]
df['description'] = df.apply(lambda row: des(row), axis=1)
df = df[['customer', 'shopping', 'description']]

Output

  customer shopping                            description
0        A     High  Customer A traveled less than 3 times
2        C     High  Customer C traveled more than 3 times