numpy.union that preserves order-CodePudding

Two arrays have been produced by dropping random values of an original array (with unique and unsorted elements):

orig = np.array([2, 1, 7, 5, 3, 8])

Let's say these arrays are:

a = np.array([2, 1, 7,    8])
b = np.array([2,    7, 3, 8])

Given just these two arrays, I need to merge them so that the dropped values are on their correct positions.

The result should be:

result = np.array([2, 1, 7, 3, 8])

Another example:

a1 = np.array([2, 1, 7, 5,    8])
b1 = np.array([2,       5, 3, 8])
# the result should be: [2, 1, 7, 5, 3, 8]

Edit:

This question is ambiguous because it is unclear what to do in this situation:

a2 = np.array([2, 1, 7,       8])
b2 = np.array([2,       5, 3, 8])
# the result should be: ???

What I have in reality solution:

Elements of these arrays are indices of two data frames containing time series. I can use pandas.merge_ordered in order to achieve the ordered indices as I want.

My previous attempts:

numpy.union1d is not suitable, because it always sorts:

np.union1d(a, b)
# array([1, 2, 3, 7, 8]) - not what I want

Maybe pandas could help?

These methods use the first array in full, and then append the leftover values of the second one:

pd.concat([pd.Series(index=a, dtype=int), pd.Series(index=b, dtype=int)], axis=1).index.to_numpy()
pd.Index(a).union(b, sort=False).to_numpy()  # jezrael's version
# array([2, 1, 7, 8, 3]) - not what I want

CodePudding user response：

Idea is join both arrays with flatten and then remove duplicates in order:

a = np.array([2, 1, 7,    8])
b = np.array([2,    7, 3, 8])

c = np.vstack((a, b)).ravel(order='F')
_, idx = np.unique(c, return_index=True)

c = c[np.sort(idx)]
print (c)
[2 1 7 3 8]

Pandas solution:

c = pd.DataFrame([a,b]).unstack().unique()
print (c)
[2 1 7 3 8]

If different number of values:

a = np.array([2, 1, 7,    8])
b = np.array([2,    7, 3])

c = pd.DataFrame({'a':pd.Series(a), 'b':pd.Series(b)}).stack().astype(int).unique()
print (c)
[2 1 7 3 8]