Home > database >  Check if a value in a column exists in another column, if so, copy value from another column into a
Check if a value in a column exists in another column, if so, copy value from another column into a

Time:11-20

I have a csv file with columns [ID] [Number1] [Number2]

Now I would like to check if [Number1] exists in [Number2] and if so, I want to copy the value of [ID] into a new column [ID2]. I've tried with np.where [Number1] == [Number2] however this is not the output I want as this is checking row by row values. What I want the script to do is Check if [Number1]'s value exists in the entirety of [Number2].

So kind of like a for loop of some sort where row 1 of [Number1] is looped until the entirety of column [Number2] is finished and then it moves onto the next row. Meaning that if Row 1 of [Number1] is 50131 and row 5555 of [Number2] is 50131, then I want the [ID] from row 1 to be copied into row 5555 as [ID2].

If the value does not exist, I would like to print 'No Match' string in the row.

Input:

ID Number1 Number2
9991 123 678
9992 321 321
9993 543 123
9994 678 543
9995 921 001

Expected Result: (when comparing Number1 to Number2)

ID Number1 Number2 ID2
9991 123 678 9994
9992 321 321 9992
9993 543 123 9991
9994 678 543 9993
9995 921 001 'NO MATCH'

CodePudding user response:

Assuming the following arrays:

ID = np.array([9991,9992,9993,9994,9995,9996])
Number1 = np.array([123,321,543,678,921,75])
Number2 = np.array([678,321,123,543,1,123])

by using advanced indexing you can rearrange for the ID2:

filled = np.isin(Number2, Number1)
ID2 = np.full(np.shape(ID), 'No Match')

idx = np.where(Number1[None, :] == Number2[:, None])[1]
ID_arr = ID[idx]

ID2[filled] = ID_arr

which will get the following result for ID2:

['9994' '9992' '9991' '9993' 'No Match']

CodePudding user response:

You can get the desired result with doing two merges. Once to get the Number1 IDs where Number2 matches and you merge those IDs with Number2 to get the desired result.

connecting_ids = (df[['ID', 'Number1']]
          .merge(df[['Number2']], 
                 how='inner', 
                 left_on='Number1', 
                 right_on='Number2')
          .drop('Number1', axis=1)
          .rename(columns={'ID': 'ID2'})
)

(df.merge(connecting_ids, 
      how='left', 
      on='Number2')
  .fillna('NO MATCH')
)
  • Related