I have a csv file with columns [ID] [Number1] [Number2]
Now I would like to check if [Number1] exists in [Number2] and if so, I want to copy the value of [ID] into a new column [ID2]. I've tried with np.where [Number1] == [Number2] however this is not the output I want as this is checking row by row values. What I want the script to do is Check if [Number1]'s value exists in the entirety of [Number2].
So kind of like a for loop of some sort where row 1 of [Number1] is looped until the entirety of column [Number2] is finished and then it moves onto the next row. Meaning that if Row 1 of [Number1] is 50131 and row 5555 of [Number2] is 50131, then I want the [ID] from row 1 to be copied into row 5555 as [ID2].
If the value does not exist, I would like to print 'No Match' string in the row.
Input:
ID | Number1 | Number2 |
---|---|---|
9991 | 123 | 678 |
9992 | 321 | 321 |
9993 | 543 | 123 |
9994 | 678 | 543 |
9995 | 921 | 001 |
Expected Result: (when comparing Number1 to Number2)
ID | Number1 | Number2 | ID2 |
---|---|---|---|
9991 | 123 | 678 | 9994 |
9992 | 321 | 321 | 9992 |
9993 | 543 | 123 | 9991 |
9994 | 678 | 543 | 9993 |
9995 | 921 | 001 | 'NO MATCH' |
CodePudding user response:
Assuming the following arrays:
ID = np.array([9991,9992,9993,9994,9995,9996])
Number1 = np.array([123,321,543,678,921,75])
Number2 = np.array([678,321,123,543,1,123])
by using advanced indexing you can rearrange for the ID2:
filled = np.isin(Number2, Number1)
ID2 = np.full(np.shape(ID), 'No Match')
idx = np.where(Number1[None, :] == Number2[:, None])[1]
ID_arr = ID[idx]
ID2[filled] = ID_arr
which will get the following result for ID2
:
['9994' '9992' '9991' '9993' 'No Match']
CodePudding user response:
You can get the desired result with doing two merges. Once to get the Number1
ID
s where Number2
matches and you merge those ID
s with Number2
to get the desired result.
connecting_ids = (df[['ID', 'Number1']]
.merge(df[['Number2']],
how='inner',
left_on='Number1',
right_on='Number2')
.drop('Number1', axis=1)
.rename(columns={'ID': 'ID2'})
)
(df.merge(connecting_ids,
how='left',
on='Number2')
.fillna('NO MATCH')
)