Pandas Dataframe lookup based on condition-CodePudding

I have these two dfs. In the first dataframes, some rows are missing "merchant_type" codes. Though from the dataframe one can see that the "merchant_type_desc" with the missing merchant_type codes can also be deduced from the df

df1

merchant_type	merchant_type_desc
8641	Associations - Civic, Social, and Frater
8641	Associations - Civic, Social, and Frater
8220	Colleges, Universities, Professional Sch
7922	Theatrical Producers (except Motion Pict
5072	Hardware Equipment and Supplies
5251	Hardware Stores
5200	Home Supply Warehouse
2741	Miscellaneous Publishing and Printing
5971	Art Dealers and Galleries
2741	Miscellaneous Publishing and Printing
	Miscellaneous Publishing and Printing
	Grocery Stores, Supermarkets
	Grocery Stores, Supermarkets
5411	Grocery Stores, Supermarkets
3715	Fairfield Inn
3596	NEW MCC CODE
3771	NEW MCC CODE
7400	NEW MCC CODE
0	Default

df2

merchant_type	merchant_type_desc
8641	Associations - Civic, Social, and Frater
8220	Colleges, Universities, Professional Sch
7922	Theatrical Producers (except Motion Pict
5072	Hardware Equipment and Supplies
5251	Hardware Stores
5200	Home Supply Warehouse
2741	Miscellaneous Publishing and Printing
5971	Art Dealers and Galleries
5411	Grocery Stores, Supermarkets
3715	Fairfield Inn
3596	NEW MCC CODE
3771	NEW MCC CODE
7400	NEW MCC CODE
0	Default

The second dataframe has the distinct merchant_type codes and their corresponding descriptions. Note the "NEW MCC CODE" merchant type description.

Merging the two dfs on merchant_type_desc column would loose the different/distinct merchant_types of the "NEW MCC CODE"

To lookup the missing merchant_type codes in df1 In excel I would have used an IF Function to first check the value of the "Merchant_type_description" if it equals "NEW MCC CODE" I would retain the present code, otherwise I would do vlookup on df2 and return the code from it. How do I achive this in Pandas dataframe.

Expected Output:

merchant_type	merchant_type_desc
8641	Associations - Civic, Social, and Frater
8641	Associations - Civic, Social, and Frater
8220	Colleges, Universities, Professional Sch
7922	Theatrical Producers (except Motion Pict
5072	Hardware Equipment and Supplies
5251	Hardware Stores
5200	Home Supply Warehouse
2741	Miscellaneous Publishing and Printing
5971	Art Dealers and Galleries
2741	Miscellaneous Publishing and Printing
2741	Miscellaneous Publishing and Printing
5411	Grocery Stores, Supermarkets
5411	Grocery Stores, Supermarkets
5411	Grocery Stores, Supermarkets
3715	Fairfield Inn
3596	NEW MCC CODE
3771	NEW MCC CODE
7400	NEW MCC CODE
0	Default

CodePudding user response：

you can use

df3 = df2.loc[df2.merchant_type_desc == 'NEW MCC CODE']

to get just the rows with merchant type NEW MCC CODE

then gather all rows

df = pd.concat([df1, df2, df3])

remove the blanks, this will depend on exactly what the blanks are, in this case I've assumed they are empty strings

df = df.loc[~df.merchant_type == '']

then remove duplicate rows

df = df.drop_duplicates()

CodePudding user response：

You can use this if you prefer a one line solution:

df1.loc[df1.merchant_type.isna(), 'merchant_type']=df1.merge(df2[df2.merchant_type_desc!="NEW MCC CODE"], how='left', on='merchant_type_desc').loc[df1.merchant_type.isna(), 'merchant_type_y']