Two arrays have been produced by dropping random values of an original array (with unique and unsorted elements):
orig = np.array([2, 1, 7, 5, 3, 8])
Let's say these arrays are:
a = np.array([2, 1, 7, 8])
b = np.array([2, 7, 3, 8])
Given just these two arrays, I need to merge them so that the dropped values are on their correct positions.
The result should be:
result = np.array([2, 1, 7, 3, 8])
Another example:
a1 = np.array([2, 1, 7, 5, 8])
b1 = np.array([2, 5, 3, 8])
# the result should be: [2, 1, 7, 5, 3, 8]
Edit:
This question is ambiguous because it is unclear what to do in this situation:
a2 = np.array([2, 1, 7, 8])
b2 = np.array([2, 5, 3, 8])
# the result should be: ???
What I have in reality solution:
Elements of these arrays are indices of two data frames containing time series. I can use pandas.merge_ordered
in order to achieve the ordered indices as I want.
My previous attempts:
numpy.union1d
is not suitable, because it always sorts:
np.union1d(a, b)
# array([1, 2, 3, 7, 8]) - not what I want
Maybe pandas could help?
These methods use the first array in full, and then append the leftover values of the second one:
pd.concat([pd.Series(index=a, dtype=int), pd.Series(index=b, dtype=int)], axis=1).index.to_numpy()
pd.Index(a).union(b, sort=False).to_numpy() # jezrael's version
# array([2, 1, 7, 8, 3]) - not what I want
CodePudding user response:
Idea is join both arrays with flatten and then remove duplicates in order:
a = np.array([2, 1, 7, 8])
b = np.array([2, 7, 3, 8])
c = np.vstack((a, b)).ravel(order='F')
_, idx = np.unique(c, return_index=True)
c = c[np.sort(idx)]
print (c)
[2 1 7 3 8]
Pandas solution:
c = pd.DataFrame([a,b]).unstack().unique()
print (c)
[2 1 7 3 8]
If different number of values:
a = np.array([2, 1, 7, 8])
b = np.array([2, 7, 3])
c = pd.DataFrame({'a':pd.Series(a), 'b':pd.Series(b)}).stack().astype(int).unique()
print (c)
[2 1 7 3 8]