I have two numpy arrays, x
and y
. They each have 3 columns and the first two columns are identifying in the sense of a relational database compound primary key. I want to merge these two arrays based on these compound primary keys, while adding the third column where the two arrays overlap.
>>> x = np.array([[1, 2, 1.5], [3, 4, 2.5], [5, 6, 3.5]])
>>> x
array([[1. , 2. , 1.5],
[3. , 4. , 2.5],
[5. , 6. , 3.5]])
>>> y = np.array([[1, 2, 4.5], [3, 4, 5.5], [7, 8, 6.5]])
>>> y
array([[1. , 2. , 4.5],
[3. , 4. , 5.5],
[7. , 8. , 6.5]])
I need a method to create array z
, which combines x
and y
like an upsert:
>>> z = np.array([[1, 2, 6.0], [3, 4, 7.0], [5, 6, 3.5], [7, 8, 6.5]])
>>> z
array([[1. , 2. , 6. ],
[3. , 4. , 7. ],
[5. , 6. , 3.5],
[7. , 8. , 6.5]])
Is it possible to do this in Numpy?
CodePudding user response:
This operation can be done in the following steps.
- Get a mask whether the first two columns are equal.
mask = x[:, :2] == y[:, :2]
- Update the existing values:
x[np.all(mask, axis=1), 2] = x[np.all(mask, axis=1), 2] y[np.all(mask, axis=1), 2]
- Inserting non existing rows:
x = np.append(x, y[np.any(~mask, axis=1)], axis=0)
Edit: For unsorted arrays of different size.
In this case I couldn't find a way as clean as the above case. But the following will perform the job for unsorted and nonequal size arrays.
import numpy as np
x = np.array([[1, 2, 1.5], [3, 4, 2.5], [5, 6, 3.5]])
y = np.array([[1, 2, 4.5], [7, 8, 6.5], [3, 4, 5.5], [8, 9, 7.5]])
print(x)
print(y)
non_existing_rows = []
# Update existing rows and store non existing rows.
for row in y:
mask = np.all(x[:,:2] == row[:2], axis = 1)
if np.all(~mask):
# The row does not exist in x.
non_existing_rows.append(row)
else:
# The row exists in x.
x[mask,2] = x[mask,2] row[2]
# Insert non existing rows.
x = np.append(x, non_existing_rows, axis=0)
print(x)
I assume that the size of x is bigger than y in general, so it is better to iterate over y. Also appending operation is faster for lists, so it is better to first aggregate non existing rows in a list and then append them to the numpy array as a whole.