Home > Software engineering >  Pandas Dataframe lookup based on condition
Pandas Dataframe lookup based on condition

Time:10-13

I have these two dfs. In the first dataframes, some rows are missing "merchant_type" codes. Though from the dataframe one can see that the "merchant_type_desc" with the missing merchant_type codes can also be deduced from the df

df1

merchant_type merchant_type_desc
8641 Associations - Civic, Social, and Frater
8641 Associations - Civic, Social, and Frater
8220 Colleges, Universities, Professional Sch
7922 Theatrical Producers (except Motion Pict
5072 Hardware Equipment and Supplies
5251 Hardware Stores
5200 Home Supply Warehouse
2741 Miscellaneous Publishing and Printing
5971 Art Dealers and Galleries
2741 Miscellaneous Publishing and Printing
Miscellaneous Publishing and Printing
Grocery Stores, Supermarkets
Grocery Stores, Supermarkets
5411 Grocery Stores, Supermarkets
3715 Fairfield Inn
3596 NEW MCC CODE
3771 NEW MCC CODE
7400 NEW MCC CODE
0 Default

df2

merchant_type merchant_type_desc
8641 Associations - Civic, Social, and Frater
8220 Colleges, Universities, Professional Sch
7922 Theatrical Producers (except Motion Pict
5072 Hardware Equipment and Supplies
5251 Hardware Stores
5200 Home Supply Warehouse
2741 Miscellaneous Publishing and Printing
5971 Art Dealers and Galleries
5411 Grocery Stores, Supermarkets
3715 Fairfield Inn
3596 NEW MCC CODE
3771 NEW MCC CODE
7400 NEW MCC CODE
0 Default

The second dataframe has the distinct merchant_type codes and their corresponding descriptions. Note the "NEW MCC CODE" merchant type description.

Merging the two dfs on merchant_type_desc column would loose the different/distinct merchant_types of the "NEW MCC CODE"

To lookup the missing merchant_type codes in df1 In excel I would have used an IF Function to first check the value of the "Merchant_type_description" if it equals "NEW MCC CODE" I would retain the present code, otherwise I would do vlookup on df2 and return the code from it. How do I achive this in Pandas dataframe.

Expected Output:

merchant_type merchant_type_desc
8641 Associations - Civic, Social, and Frater
8641 Associations - Civic, Social, and Frater
8220 Colleges, Universities, Professional Sch
7922 Theatrical Producers (except Motion Pict
5072 Hardware Equipment and Supplies
5251 Hardware Stores
5200 Home Supply Warehouse
2741 Miscellaneous Publishing and Printing
5971 Art Dealers and Galleries
2741 Miscellaneous Publishing and Printing
2741 Miscellaneous Publishing and Printing
5411 Grocery Stores, Supermarkets
5411 Grocery Stores, Supermarkets
5411 Grocery Stores, Supermarkets
3715 Fairfield Inn
3596 NEW MCC CODE
3771 NEW MCC CODE
7400 NEW MCC CODE
0 Default

CodePudding user response:

you can use

df3 = df2.loc[df2.merchant_type_desc == 'NEW MCC CODE']

to get just the rows with merchant type NEW MCC CODE

then gather all rows

df = pd.concat([df1, df2, df3])

remove the blanks, this will depend on exactly what the blanks are, in this case I've assumed they are empty strings

df = df.loc[~df.merchant_type == '']

then remove duplicate rows

df = df.drop_duplicates()

CodePudding user response:

You can use this if you prefer a one line solution:

df1.loc[df1.merchant_type.isna(), 'merchant_type']=df1.merge(df2[df2.merchant_type_desc!="NEW MCC CODE"], how='left', on='merchant_type_desc').loc[df1.merchant_type.isna(), 'merchant_type_y']
  • Related