Check if a value in a column exists in another column, if so, copy value from another column into a-CodePudding

I have a csv file with columns [ID] [Number1] [Number2]

Now I would like to check if [Number1] exists in [Number2] and if so, I want to copy the value of [ID] into a new column [ID2]. I've tried with np.where [Number1] == [Number2] however this is not the output I want as this is checking row by row values. What I want the script to do is Check if [Number1]'s value exists in the entirety of [Number2].

So kind of like a for loop of some sort where row 1 of [Number1] is looped until the entirety of column [Number2] is finished and then it moves onto the next row. Meaning that if Row 1 of [Number1] is 50131 and row 5555 of [Number2] is 50131, then I want the [ID] from row 1 to be copied into row 5555 as [ID2].

If the value does not exist, I would like to print 'No Match' string in the row.

Input:

ID	Number1	Number2
9991	123	678
9992	321	321
9993	543	123
9994	678	543
9995	921	001

Expected Result: (when comparing Number1 to Number2)

ID	Number1	Number2	ID2
9991	123	678	9994
9992	321	321	9992
9993	543	123	9991
9994	678	543	9993
9995	921	001	'NO MATCH'

CodePudding user response：

Assuming the following arrays:

ID = np.array([9991,9992,9993,9994,9995,9996])
Number1 = np.array([123,321,543,678,921,75])
Number2 = np.array([678,321,123,543,1,123])

by using advanced indexing you can rearrange for the ID2:

filled = np.isin(Number2, Number1)
ID2 = np.full(np.shape(ID), 'No Match')

idx = np.where(Number1[None, :] == Number2[:, None])[1]
ID_arr = ID[idx]

ID2[filled] = ID_arr

which will get the following result for ID2:

['9994' '9992' '9991' '9993' 'No Match']

CodePudding user response：

You can get the desired result with doing two merges. Once to get the Number1 IDs where Number2 matches and you merge those IDs with Number2 to get the desired result.

connecting_ids = (df[['ID', 'Number1']]
          .merge(df[['Number2']], 
                 how='inner', 
                 left_on='Number1', 
                 right_on='Number2')
          .drop('Number1', axis=1)
          .rename(columns={'ID': 'ID2'})
)

(df.merge(connecting_ids, 
      how='left', 
      on='Number2')
  .fillna('NO MATCH')
)