I have these two dfs. In the first dataframes, some rows are missing "merchant_type" codes. Though from the dataframe one can see that the "merchant_type_desc" with the missing merchant_type codes can also be deduced from the df
df1
merchant_type | merchant_type_desc |
---|---|
8641 | Associations - Civic, Social, and Frater |
8641 | Associations - Civic, Social, and Frater |
8220 | Colleges, Universities, Professional Sch |
7922 | Theatrical Producers (except Motion Pict |
5072 | Hardware Equipment and Supplies |
5251 | Hardware Stores |
5200 | Home Supply Warehouse |
2741 | Miscellaneous Publishing and Printing |
5971 | Art Dealers and Galleries |
2741 | Miscellaneous Publishing and Printing |
Miscellaneous Publishing and Printing | |
Grocery Stores, Supermarkets | |
Grocery Stores, Supermarkets | |
5411 | Grocery Stores, Supermarkets |
3715 | Fairfield Inn |
3596 | NEW MCC CODE |
3771 | NEW MCC CODE |
7400 | NEW MCC CODE |
0 | Default |
df2
merchant_type | merchant_type_desc |
---|---|
8641 | Associations - Civic, Social, and Frater |
8220 | Colleges, Universities, Professional Sch |
7922 | Theatrical Producers (except Motion Pict |
5072 | Hardware Equipment and Supplies |
5251 | Hardware Stores |
5200 | Home Supply Warehouse |
2741 | Miscellaneous Publishing and Printing |
5971 | Art Dealers and Galleries |
5411 | Grocery Stores, Supermarkets |
3715 | Fairfield Inn |
3596 | NEW MCC CODE |
3771 | NEW MCC CODE |
7400 | NEW MCC CODE |
0 | Default |
The second dataframe has the distinct merchant_type codes and their corresponding descriptions. Note the "NEW MCC CODE" merchant type description.
Merging the two dfs on merchant_type_desc column would loose the different/distinct merchant_types of the "NEW MCC CODE"
To lookup the missing merchant_type codes in df1 In excel I would have used an IF Function to first check the value of the "Merchant_type_description" if it equals "NEW MCC CODE" I would retain the present code, otherwise I would do vlookup on df2 and return the code from it. How do I achive this in Pandas dataframe.
Expected Output:
merchant_type | merchant_type_desc |
---|---|
8641 | Associations - Civic, Social, and Frater |
8641 | Associations - Civic, Social, and Frater |
8220 | Colleges, Universities, Professional Sch |
7922 | Theatrical Producers (except Motion Pict |
5072 | Hardware Equipment and Supplies |
5251 | Hardware Stores |
5200 | Home Supply Warehouse |
2741 | Miscellaneous Publishing and Printing |
5971 | Art Dealers and Galleries |
2741 | Miscellaneous Publishing and Printing |
2741 | Miscellaneous Publishing and Printing |
5411 | Grocery Stores, Supermarkets |
5411 | Grocery Stores, Supermarkets |
5411 | Grocery Stores, Supermarkets |
3715 | Fairfield Inn |
3596 | NEW MCC CODE |
3771 | NEW MCC CODE |
7400 | NEW MCC CODE |
0 | Default |
CodePudding user response:
you can use
df3 = df2.loc[df2.merchant_type_desc == 'NEW MCC CODE']
to get just the rows with merchant type NEW MCC CODE
then gather all rows
df = pd.concat([df1, df2, df3])
remove the blanks, this will depend on exactly what the blanks are, in this case I've assumed they are empty strings
df = df.loc[~df.merchant_type == '']
then remove duplicate rows
df = df.drop_duplicates()
CodePudding user response:
You can use this if you prefer a one line solution:
df1.loc[df1.merchant_type.isna(), 'merchant_type']=df1.merge(df2[df2.merchant_type_desc!="NEW MCC CODE"], how='left', on='merchant_type_desc').loc[df1.merchant_type.isna(), 'merchant_type_y']