I am trying to use mapping to match column "Asset" from both dataframes (both dataframes are named "names" and "match") and then based on that match add the column "paid" from dataframe "match" to "names". I am trying to do this without merging if possible. Only using mapping. Thank you!
Below is how I tried doing it but nothing is showing up on my names dataframe.
names['paid']=names.Asset.map(match.paid)
What I have is not giving me an error but its not adding anything.
CodePudding user response:
names = pd.DataFrame({'asset':[101, 250, 312, 632]})
match = pd.DataFrame({'asset':[100, 250, 127, 312], 'paid':[15,26,31,22]})
index | asset |
---|---|
0 | 101 |
1 | 250 |
2 | 312 |
3 | 632 |
index | asset | paid |
---|---|---|
0 | 100 | 15 |
1 | 250 | 26 |
2 | 127 | 31 |
3 | 312 | 22 |
match_asset = match['paid']
match_asset = match_asset.set_axis(match['asset'].to_list(), axis=0)
match_asset
100 15
250 26
127 31
312 22
Name: Paid, dtype: int64
names['paid'] = names['asset'].map(match_asset)
names
index | asset | paid |
---|---|---|
0 | 101 | NaN |
1 | 250 | 26.0 |
2 | 312 | 22.0 |
3 | 632 | NaN |
Everything can be written in a single line:
names['paid'] = names['asset'].map(match['paid'].set_axis(match['asset'].to_list(), axis=0))
and avoid generating an intermediate Series, in case the data is too large.
The problem with your original version is that match.paid
doesn't have match.asset
as index. So, when you try to use map, it doesn't know which row from match.paid
should be assigned to each row of names.paid
.
If there are duplicates in asset, and the price of those assets is the same, you can use
names['paid'] = names['asset'].map(match.drop_duplicates(subset=['asset'])['paid'].set_axis(match.drop_duplicates(subset=['asset'])['asset'].to_list(), axis=0))
However, note that this is going to map the price of the first time a value for assets appear. So it may be a problem if two equal assets have different prices.
CodePudding user response:
I'm not sure if this would serve your needs but I once created a function that behaves similarly to Excel's VLOOKUP (or rather XLOOKUP). Please see it below:
def table_lookup(data: pd.DataFrame,
lookup_column: str,
lookup_value,
output_column: str,
value_pos: int = -1,
errors: str = 'raise'):
"""
Python implementation of Excel's XLOOKUP function
Parameters
----------
data : pd.DataFrame
DataFrame on which the lookup is performed
lookup_column : str
Column used as a key for search
lookup_value : TYPE
Value searched in the key column
output_column : str
Column containing value for the given key
value_pos : int, optional
In case of more than one match, regulates which one shall be returned. The default is -1.
errors : str, optional
Regulates function's behaviour in case of an IndexError. The default is 'raise'.
Returns
-------
Searched value for a given key
"""
try:
df = data[data[lookup_column] == lookup_value][output_column]
return df.values[value_pos]
except IndexError as e:
if errors == 'raise':
raise Exception(f'Output does not contain any values at position {value_pos}') from e
elif errors == 'ignore':
return np.nan
elif errors == 'coerce':
return df.values[-1] if len(df) else np.nan
# Usage:
# df['new_col'] = df['key_col'].apply(lambda x: table_lookup(mapping_df, 'key_col', x, 'value_col'))