I have two series that are equal but in different order.
data1 = np.array(['1','2','3','4','5','6'])
data2=np.array(['6','2','4','3','1','5'])
sr1 = pd.Series(data1)
sr2=pd.Series(data2)
the two series are outputs of different functions and I'm testing if they are equal:
pd.testing.assert_series_equal(sr1,sr2,check_names=False)
This is failing of course because the two series are not in the same order.
I checked in the documentation they have online, they mention check_like
but it does not work for me (I guess because I don't have the same version of pandas).
Is there a quick way to test if these two series are equal even if they are not in the same order for a unit test without updating any packages ?
CodePudding user response:
Assuming you consider the Series equal if they have the same items, I would use:
sr1.value_counts().eq(sr2.value_counts()).all()
Or, without sorting, which should be more efficient (sorting is O(n*logn)):
sr1.value_counts(sort=False).eq(sr2.value_counts(sort=False)).all()
Output: True
CodePudding user response:
you can check if the sorted versions are the same to eliminate the order:
(np.sort(sr1) == np.sort(sr2)).all()
If there are missings, need to handle them separately to check if same number of missings, and then the rest:
((sr1.isna().sum() == sr2.isna().sum())
and (np.sort(sr1.dropna()) == np.sort(sr2.dropna())).all())