Home > other >  Python - Pandas mapping error: Reindexing only valid with uniquely valued Index objects
Python - Pandas mapping error: Reindexing only valid with uniquely valued Index objects

Time:05-11

I am working on a function which is meant to map an ID from one dataframe to another based on one of two fields (Ticker or CUSIP). The main df is a custodian file which has the fields below:

Ticker CUSIP
0 AAA NaN
1 ABC NaN
2 NaN 123456789
3 UNK NaN

The map_df is a secondary file downloaded internally which maps the custodian file (df) to an arbitrary "Owned_ID" generated the first time a security is entered into our system. It looks something like:

Owned_ID Ticker CUSIP
0 0 AAA
1 11 ABC 111111111
2 22 NaN 123456789
3 33
4 44

My goal is to create a file which maps the CUSIP to the map file to pull the Owned ID. If a CUSIP is not available, then it should take the Ticker and pull an Owned ID. If neither the CUSIP nor Ticker can be found within the mapping file, it can return NaN. Therefore the output dataframe should look like:

Ticker CUSIP Owned_ID
0 AAA NaN 0
1 ABC NaN 11
2 NaN 123456789 22
3 UNK NaN NaN

I have come up with the code below which essentially maps the dataframes over by reindexing but keep running into the error:

  File "C:\Users\JeffNi\PycharmProjects\KP21 - Addepar Upload Generator\Archives\OwnedID Match Sample.py", line 41, in <module>
    df['Owned_ID'] = np.where(mask_a, df['CUSIP'].map(a)

  File "C:\Users\JeffNi\anaconda3\lib\site-packages\pandas\core\series.py", line 3909, in map
    new_values = super()._map_values(arg, na_action=na_action)

  File "C:\Users\JeffNi\anaconda3\lib\site-packages\pandas\core\base.py", line 907, in _map_values
    indexer = mapper.index.get_indexer(values)

  File "C:\Users\JeffNi\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3171, in get_indexer
    raise InvalidIndexError(

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Here is the code snippet I am using to replicate this error.


import pandas as pd
df = pd.DataFrame({'Ticker': {0: 'AAA', 1: 'ABC', 2: pd.NA, 3:'UNK'},  
                   'CUSIP': {0: pd.NA, 1: pd.NA, 2: '123456789', 3:pd.NA}})


df_map = pd.DataFrame(
{'Owned_ID' : {0: '11', 1: '22', 2: '33', 3:'44', 4:'55'},
'Ticker': {0: 'AAA', 1: 'ABC', 2: pd.NA, 3:pd.NA, 4:pd.NA}, 
'CUSIP': {0: pd.NA, 1: '111111111', 2: '123456789', 3:pd.NA, 4:pd.NA}})

a = df_map.set_index('CUSIP')['Owned_ID']

b = df_map.set_index('Ticker')['Owned_ID']

mask_a = df['CUSIP'].isnull()

df['Owned_ID'] = np.where(mask_a, df['CUSIP'].map(a), df['Ticker'].map(b))


print(df)

I think it might have something to do with the mapping file having duplicates in the form of NA, but I have not been able to figure out what I need to do to make the indexing work (or if there is a better way to approach this problem). I imagine it being similar to a Vlookup function in Excel, where I lookup CUSIP in map_df, it if ails, lookup Ticker in map_df and if that fails just return an error.

Thanks for any input!

CodePudding user response:

You should drop the na values first for the mapping dataframes:

a = df_map.dropna(subset=["CUSIP"]).set_index('CUSIP')['Owned_ID']

b = df_map.dropna(subset=["Ticker"]).set_index('Ticker')['Owned_ID']

mask_a = df['CUSIP'].isnull()

df['Owned_ID'] = np.where(mask_a, df['Ticker'].map(b), df['CUSIP'].map(a))

Note that you also needed to swap the np.where arguments, because they were the wrong way around.

To show that this is in the order you want:

df = pd.DataFrame({'Ticker': {0: 'AAA', 1: 'ABC', 2: "test", 3:'UNK'},  
                   'CUSIP': {0: pd.NA, 1: pd.NA, 2: '123456789', 3:pd.NA}})


df_map = pd.DataFrame(
{'Owned_ID' : {0: '11', 1: '22', 2: '33', 3:'44', 4:'55', 5: "66"},
'Ticker': {0: 'AAA', 1: 'ABC', 2: pd.NA, 3:pd.NA, 4:pd.NA, 5: "test"}, 
'CUSIP': {0: pd.NA, 1: '111111111', 2: '123456789', 3:pd.NA, 4:pd.NA, 5:pd.NA}})

a = df_map.dropna(subset=["CUSIP"]).set_index('CUSIP')['Owned_ID']

b = df_map.dropna(subset=["Ticker"]).set_index('Ticker')['Owned_ID']

mask_a = df['CUSIP'].isnull()

df['Owned_ID'] = np.where(mask_a, df['Ticker'].map(b), df['CUSIP'].map(a))
print(df)
#  Ticker      CUSIP Owned_ID
#0    AAA       <NA>       11
#1    ABC       <NA>       22
#2   test  123456789       33
#3    UNK       <NA>      NaN

Index "2" is 33, not 66 here, so took the CUSIP as this was available.

  • Related