Home > Back-end >  using testing.assert_series_equal when series are not in the same order
using testing.assert_series_equal when series are not in the same order

Time:01-16

I have two series that are equal but in different order.

data1 = np.array(['1','2','3','4','5','6'])
data2=np.array(['6','2','4','3','1','5'])
sr1 = pd.Series(data1)
sr2=pd.Series(data2)

the two series are outputs of different functions and I'm testing if they are equal:

pd.testing.assert_series_equal(sr1,sr2,check_names=False)

This is failing of course because the two series are not in the same order. I checked in the documentation they have online, they mention check_like but it does not work for me (I guess because I don't have the same version of pandas). Is there a quick way to test if these two series are equal even if they are not in the same order for a unit test without updating any packages ?

CodePudding user response:

Assuming you consider the Series equal if they have the same items, I would use:

sr1.value_counts().eq(sr2.value_counts()).all()

Or, without sorting, which should be more efficient (sorting is O(n*logn)):

sr1.value_counts(sort=False).eq(sr2.value_counts(sort=False)).all()

Output: True

CodePudding user response:

you can check if the sorted versions are the same to eliminate the order:

(np.sort(sr1) == np.sort(sr2)).all()

If there are missings, need to handle them separately to check if same number of missings, and then the rest:

((sr1.isna().sum() == sr2.isna().sum())
  and (np.sort(sr1.dropna()) == np.sort(sr2.dropna())).all())
  • Related