I have a pandas dataframe like as shown below
Company,year
T123 Inc Ltd,1990
T124 PVT ltd,1991
ABC Limited,1992
ABCDE Ltd,1994
tf = pd.read_clipboard(sep=',')
tf['Company_copy'] = tf['Company']
I would like to compare each value from tf['company']
against each value of tf['company_copy
] but exclude same matching row number or index number, string
For ex: I want T123 Inc Ltd
to be compared with remaining 3 items. Similarly, I want ABCDE Ltd
to be compared only with remanining 3 items.
So, I tried the below with the help of this .
I expect my output to be like as below. You can see it doesn't have duplicate/same row comparison
Company Company_copy
T123 Inc Ltd T124 PVT ltd ( T123 Inc Ltd, T124 PVT ltd)
ABC Limited ( T123 Inc Ltd, ABC Limited)
ABCDE Ltd ( T123 Inc Ltd, ABCDE Ltd)
T124 PVT ltd T123 Inc Ltd ( T124 PVT ltd, T123 Inc Ltd)
ABC Limited ( T124 PVT ltd, ABC Limited)
ABCDE Ltd ( T124 PVT ltd, ABCDE Ltd)
ABC Limited T123 Inc Ltd ( ABC Limited, T123 Inc Ltd)
T124 PVT ltd ( ABC Limited, T124 PVT ltd)
ABCDE Ltd ( ABC Limited, ABCDE Ltd)
ABCDE Ltd T123 Inc Ltd ( ABCDE Ltd, T123 Inc Ltd)
T124 PVT ltd ( ABCDE Ltd, T124 PVT ltd)
ABC Limited ( ABCDE Ltd, ABC Limited)
CodePudding user response:
You can compare both levels of MultiIndex
for not equal, comapre first and second level:
compare = pd.MultiIndex.from_product([tf['Company'].astype(str),tf['Company_copy'].astype(str)]).to_series()
compare = compare[compare.index.get_level_values(0) != compare.index.get_level_values(1)]
print (compare)
Company Company_copy
T123 Inc Ltd T124 PVT ltd (T123 Inc Ltd, T124 PVT ltd)
ABC Limited (T123 Inc Ltd, ABC Limited)
ABCDE Ltd (T123 Inc Ltd, ABCDE Ltd)
T124 PVT ltd T123 Inc Ltd (T124 PVT ltd, T123 Inc Ltd)
ABC Limited (T124 PVT ltd, ABC Limited)
ABCDE Ltd (T124 PVT ltd, ABCDE Ltd)
ABC Limited T123 Inc Ltd (ABC Limited, T123 Inc Ltd)
T124 PVT ltd (ABC Limited, T124 PVT ltd)
ABCDE Ltd (ABC Limited, ABCDE Ltd)
ABCDE Ltd T123 Inc Ltd (ABCDE Ltd, T123 Inc Ltd)
T124 PVT ltd (ABCDE Ltd, T124 PVT ltd)
ABC Limited (ABCDE Ltd, ABC Limited)
dtype: object