df = pd.DataFrame({'ID' : ['ID 1', 'ID 1', 'ID 1', 'ID 2', 'ID 2', 'ID 3', 'ID 3'],
'Code' : ['Apple', 'A123', 'Apple', 'Banana', 'Banana', 'K123', 'K123'],
'Code_Type' : ['Code name', 'Code ID', 'Code name', 'Code name', 'Code name', 'Code ID', 'Code ID']}
)
df
I have a pandas dataframe (~100k rows) that looks something like this.
ID Code Code_Type
ID 1 Apple Code name
ID 1 Apple Code name
ID 1 A123 Code ID
ID 2 Banana Code name
ID 2 Banana Code name
ID 3 K123 Code ID
ID 3 K123 Code ID
I am trying to iterate through my dataframe and for each ID take the code based on conditions around the code type.
If an ID has both a code name and a code ID associated to it, then take the code ID value and apply it to the code column.
If it has only a code name or a code ID then just pass.
So far the setup I have is something like this.
for index, value, value2 in zip(df.ID, df.Code, df.Code_Type):
print(index, value, value2)
However I am not quite sure where to go from here and end up with the resulting dataframe below.
ID Code Code_Type
ID 1 A123 Code name
ID 1 A123 Code name
ID 1 A123 Code ID
ID 2 Banana Code name
ID 2 Banana Code name
ID 3 K123 Code ID
ID 3 K123 Code ID
Ideally I would like to create a dictionary mapping like this and just apply that to the dataframe.
{'ID 1' : 'A123',
'ID 2' : 'Banana',
'ID 3' : 'K123'}
Any help at all is greatly appreciated.
CodePudding user response:
df.query("ID == Code_Type")
CodePudding user response:
Code:
df.groupby(['ID']).last()['Code'].T.to_dict()
Output:
{'ID 1': 'a123', 'ID 2': 'bANANA', 'ID 3': 'K123'}