Home > OS >  Best way to filter for multiple conditions in the same df
Best way to filter for multiple conditions in the same df

Time:12-01

Ive got a df thats been merged and I want to do some logic to it so that I capture issues from the data sources.

I want to capture both when theres a situation when the Areacode's match but T's do not AND when both Areacode's and T's dont match at all.

Here's a merged_df before the filter.

Name     t_1        Areacode_1     t_2       Areacode_2

Jerry    New Jersey    12674       Texas       12674
Elaine   New York      98765       Alaska      78654
George   New York      12345       New York    12345

Is there a way to do this all in one filter? This is what I have so far, but it would be nice to put it as one line:

m = merged_df.loc[(merged_df['t_1'] != merged_df['t_2']) & (merged_df['Areacode_1'] == merged_df['Areacode_2']) ]
m2 = merged_df.loc[(merged_df['t_1'] != merged_df['t_2']) & (merged_df['Areacode_1'] != merged_df['Areacode_2']) ]

After the filter Id expect George to be removed because all columns matched.

Expected merged_df:

Name     t_1        Areacode_1     t_2       Areacode_2

Jerry    New Jersey    12674       Texas       12674
Elaine   New York      98765       Alaska      78654

CodePudding user response:

You could do it like this:

import pandas as pd

merged_df = pd.DataFrame({'Name':['Jerry','Elaine','George'], 
                          't_1':['New Jersey', 'New York','New York'], 
                          'Areacode_1': [12674,98765,12345], 
                          't_2':['Texas','Alaska','New York'],
                          'Areacode_2':[12674,78654,12345]})
                  
filtered1 = merged_df.loc[~((merged_df.t_1 == merged_df.t_2) & (merged_df.Areacode_1 == merged_df.Areacode_2))]
display(filtered1)

filtered2 = merged_df.loc[(merged_df.t_1 != merged_df.t_2)]
display(filtered2)

Note that filtered1 shows the same output as filtered2 and is the same as your 'Expected merged_df'.
Both will essentially meet your criteria.

CodePudding user response:

I used np.where to solve this.

merged_df2 = merged_df.assign(Filter = np.where((merged_df['Salesforce_Territory'] !=
merged_df['Snowflake Territory']) & (merged_df['Salesforce_Zip_Code'] != merged_df['Snowflake Zip']) | 
((merged_df['Salesforce_Territory'] != merged_df['Snowflake Territory'])), True, False))
  • Related