Home > Enterprise >  Map column values by ID based on multiple conditions
Map column values by ID based on multiple conditions

Time:11-24

df = pd.DataFrame({'ID' : ['ID 1', 'ID 1', 'ID 1', 'ID 2', 'ID 2', 'ID 3', 'ID 3'],
                   'Code' : ['Apple', 'A123', 'Apple', 'Banana', 'Banana', 'K123', 'K123'],
                   'Code_Type' : ['Code name', 'Code ID', 'Code name', 'Code name', 'Code name', 'Code ID', 'Code ID']}
                 )

df

I have a pandas dataframe (~100k rows) that looks something like this.

ID      Code    Code_Type
ID 1    Apple   Code name
ID 1    Apple   Code name
ID 1    A123    Code ID
ID 2    Banana  Code name
ID 2    Banana  Code name
ID 3    K123    Code ID
ID 3    K123    Code ID

I am trying to iterate through my dataframe and for each ID take the code based on conditions around the code type.

If an ID has both a code name and a code ID associated to it, then take the code ID value and apply it to the code column.

If it has only a code name or a code ID then just pass.

So far the setup I have is something like this.

for index, value, value2 in zip(df.ID, df.Code, df.Code_Type):
    print(index, value, value2)

However I am not quite sure where to go from here and end up with the resulting dataframe below.

ID      Code    Code_Type
ID 1    A123    Code name
ID 1    A123    Code name
ID 1    A123    Code ID
ID 2    Banana  Code name
ID 2    Banana  Code name
ID 3    K123    Code ID
ID 3    K123    Code ID

Ideally I would like to create a dictionary mapping like this and just apply that to the dataframe.

{'ID 1' : 'A123',
 'ID 2' : 'Banana',
 'ID 3' : 'K123'}

Any help at all is greatly appreciated.

CodePudding user response:

df.query("ID == Code_Type")

CodePudding user response:

Code:

df.groupby(['ID']).last()['Code'].T.to_dict()

Output:

{'ID 1': 'a123', 'ID 2': 'bANANA', 'ID 3': 'K123'}
  • Related