Ive got a df thats been merged and I want to do some logic to it so that I capture issues from the data sources.
I want to capture both when theres a situation when the Areacode's match but T's do not AND when both Areacode's and T's dont match at all.
Here's a merged_df before the filter.
Name t_1 Areacode_1 t_2 Areacode_2
Jerry New Jersey 12674 Texas 12674
Elaine New York 98765 Alaska 78654
George New York 12345 New York 12345
Is there a way to do this all in one filter? This is what I have so far, but it would be nice to put it as one line:
m = merged_df.loc[(merged_df['t_1'] != merged_df['t_2']) & (merged_df['Areacode_1'] == merged_df['Areacode_2']) ]
m2 = merged_df.loc[(merged_df['t_1'] != merged_df['t_2']) & (merged_df['Areacode_1'] != merged_df['Areacode_2']) ]
After the filter Id expect George to be removed because all columns matched.
Expected merged_df:
Name t_1 Areacode_1 t_2 Areacode_2
Jerry New Jersey 12674 Texas 12674
Elaine New York 98765 Alaska 78654
CodePudding user response:
You could do it like this:
import pandas as pd
merged_df = pd.DataFrame({'Name':['Jerry','Elaine','George'],
't_1':['New Jersey', 'New York','New York'],
'Areacode_1': [12674,98765,12345],
't_2':['Texas','Alaska','New York'],
'Areacode_2':[12674,78654,12345]})
filtered1 = merged_df.loc[~((merged_df.t_1 == merged_df.t_2) & (merged_df.Areacode_1 == merged_df.Areacode_2))]
display(filtered1)
filtered2 = merged_df.loc[(merged_df.t_1 != merged_df.t_2)]
display(filtered2)
Note that filtered1 shows the same output as filtered2 and is the same as your 'Expected merged_df'.
Both will essentially meet your criteria.
CodePudding user response:
I used np.where to solve this.
merged_df2 = merged_df.assign(Filter = np.where((merged_df['Salesforce_Territory'] !=
merged_df['Snowflake Territory']) & (merged_df['Salesforce_Zip_Code'] != merged_df['Snowflake Zip']) |
((merged_df['Salesforce_Territory'] != merged_df['Snowflake Territory'])), True, False))