I have a big dataframe of items which is simplified as below. I am looking for good way to find the the item(A, B, C) in each row which is repeated more than or equal to 2 times.
for example in row1 it is A and in row2 result is B.
simplified df:
df = pd.DataFrame({'C1':['A','B','A','A','C'],
'C2':['B','A','A','C','B'],
'C3':['A','B','A','C','C']},
index =['ro1','ro2','ro3','ro4','ro5']
)
CodePudding user response:
Like mozway suggested, we don't know what will be your output. I will assume you need a list.
You can try something like this.
import pandas as pd
from collections import Counter
holder = []
for index in range(len(df)):
temp = Counter(df.iloc[index,:].values)
holder.append(','.join([key for key,value in temp.items() if value >= 2]))
CodePudding user response:
As you have three columns and always a non unique, you can conveniently use mode
.
df.mode(1)[0]
Output:
ro1 A
ro2 B
ro3 A
ro4 C
ro5 C
Name: 0, dtype: object
If you might have all unique values (e.g. A/B/C), you need to check that the mode is not unique:
m = df.mode(1)[0]
m2 = df.eq(m, axis=0).sum(1).le(1)
m.mask(m2)