A | B | C | D | #_identical | value |
---|---|---|---|---|---|
1 | 1 | 1 | 2 | 3 | 1 |
3 | 3 | 1 | 2 | 2 | 3 |
4 | 4 | 4 | 4 | 4 | 4 |
1 | 2 | 1 | 2 | 2 | [1,2] |
Where A,B,C,D are columns with values and '#_identical' shows the number of same values among A,B,C,D. And 'value' shows the value of the identical value.
CodePudding user response:
Here is one approach using a custom function:
def count(s):
c = s.value_counts()
c = c[c>1]
return pd.Series({'#_identical': c.unique().tolist()[0],
'value': c.index.to_list()
})
df.join(df.apply(count, axis=1))
output:
A B C D #_identical value
0 1 1 1 2 3 [1]
1 3 3 1 2 2 [3]
2 4 4 4 4 4 [4]
3 1 2 1 2 2 [1, 2]
CodePudding user response:
You can use Counter
from collections import Counter
df['counter'] = df.apply(Counter, axis=1)
df['value'] = df['counter'].apply(lambda x: [key for key in x.keys() if x[key] == max(x.values())])
df['#_identical'] = df['counter'].apply(lambda x: [x[key] for key in x.keys() if x[key] == max(x.values())])
df['#_identical'] = df['#_identical'].apply(lambda x:list(set(x)))
df.drop(['counter'],axis=1, inplace=True)
print(df):
A B C D value #_identical
0 1 1 1 2 [1] [3]
1 3 3 1 2 [3] [2]
2 4 4 4 4 [4] [4]
3 1 2 1 2 [1, 2] [2]
You can convert the lists into scalar if you would like, but I imagine you need lists with more data you see.