Using mapping in PANDAS to do a automated Vlookup-CodePudding

I am trying to use mapping to match column "Asset" from both dataframes (both dataframes are named "names" and "match") and then based on that match add the column "paid" from dataframe "match" to "names". I am trying to do this without merging if possible. Only using mapping. Thank you!

Below is how I tried doing it but nothing is showing up on my names dataframe.

names['paid']=names.Asset.map(match.paid)

What I have is not giving me an error but its not adding anything.

CodePudding user response：

names = pd.DataFrame({'asset':[101, 250, 312, 632]})
match = pd.DataFrame({'asset':[100, 250, 127, 312], 'paid':[15,26,31,22]})

index	asset
0	101
1	250
2	312
3	632

index	asset	paid
0	100	15
1	250	26
2	127	31
3	312	22

match_asset = match['paid']
match_asset = match_asset.set_axis(match['asset'].to_list(), axis=0)
match_asset

100    15
250    26
127    31
312    22
Name: Paid, dtype: int64

names['paid'] = names['asset'].map(match_asset)
names

index	asset	paid
0	101	NaN
1	250	26.0
2	312	22.0
3	632	NaN

Everything can be written in a single line:

names['paid'] = names['asset'].map(match['paid'].set_axis(match['asset'].to_list(), axis=0))

and avoid generating an intermediate Series, in case the data is too large.

The problem with your original version is that match.paid doesn't have match.asset as index. So, when you try to use map, it doesn't know which row from match.paid should be assigned to each row of names.paid.

If there are duplicates in asset, and the price of those assets is the same, you can use

names['paid'] = names['asset'].map(match.drop_duplicates(subset=['asset'])['paid'].set_axis(match.drop_duplicates(subset=['asset'])['asset'].to_list(), axis=0))

However, note that this is going to map the price of the first time a value for assets appear. So it may be a problem if two equal assets have different prices.

CodePudding user response：

I'm not sure if this would serve your needs but I once created a function that behaves similarly to Excel's VLOOKUP (or rather XLOOKUP). Please see it below:

def table_lookup(data: pd.DataFrame, 
                 lookup_column: str, 
                 lookup_value, 
                 output_column: str,
                 value_pos: int = -1, 
                 errors: str = 'raise'):
    """
    Python implementation of Excel's XLOOKUP function

    Parameters
    ----------
    data : pd.DataFrame
        DataFrame on which the lookup is performed
    lookup_column : str
        Column used as a key for search
    lookup_value : TYPE
        Value searched in the key column
    output_column : str
        Column containing value for the given key
    value_pos : int, optional
        In case of more than one match, regulates which one shall be returned. The default is -1.
    errors : str, optional
        Regulates function's behaviour in case of an IndexError. The default is 'raise'.

    Returns
    -------
    Searched value for a given key

    """
    
    try:
        df = data[data[lookup_column] == lookup_value][output_column]
        return df.values[value_pos]
    except IndexError as e:
        if errors == 'raise':
            raise Exception(f'Output does not contain any values at position {value_pos}') from e
        elif errors == 'ignore':
            return np.nan
        elif errors == 'coerce':
            return df.values[-1] if len(df) else np.nan
        

# Usage:
# df['new_col'] = df['key_col'].apply(lambda x: table_lookup(mapping_df, 'key_col', x, 'value_col'))