I have a dataframe with several rows and I need to assign a number (new column) according to the values of the other columns:
- If all values in the different columns are the same the new value would be 5,
- if at least 4 values are the same it would be 4,
- if at least 3 values are the same, 3
- and so on until all are different values and the new value would be 0.
p1 p2 p3 p4 p5
0 1 1 1 1 1
1 2 3 2 2 2
2 2 2 3 4 1
3 4 4 1 1 1
4 2 1 2 3 5
5 1 2 3 4 5
Output
p1 p2 p3 p4 p5 new
0 1 1 1 1 1 5
1 2 3 2 2 2 4
2 2 2 3 4 1 2
3 4 4 1 1 1 3
4 2 1 2 3 5 2
5 1 2 3 4 5 0
For the first example where all are the same, I used np.where and it works:
df['new'] = np.where((df['p1'] == df['p2']) & (df['p1'] == df['p3']) & (df['p1'] == df['p4']) & (df['p1'] == df['p5']), 5, 0)
CodePudding user response:
You can make use of collections.Counter
to count how many items are equal. I come up with this way:
import numpy as np
import pandas as pd
from collections import Counter
data = [
[1,1,1,1,1],
[2,3,2,2,2],
[2,2,3,4,1],
[4,4,1,1,1],
[2,1,2,3,5],
[1,2,3,4,5]
]
df = pd.DataFrame(data, columns=['p1','p2','p3','p4','p5'])
print(df)
new_col = []
for i in range(len(df)):
a = dict(Counter(df.iloc[i].tolist()))
m = a[max(a, key=a.get)]
if m > 1:
new_col.append(a[max(a, key=a.get)])
else:
new_col.append(0)
df['new'] = new_col
print(df)
Output:
p1 p2 p3 p4 p5 new
0 1 1 1 1 1 5
1 2 3 2 2 2 4
2 2 2 3 4 1 2
3 4 4 1 1 1 3
4 2 1 2 3 5 2
5 1 2 3 4 5 0
CodePudding user response:
A slightly different approach from that suggested by @DeusDev would be:
def computeDiffs(rw):
g = Counter(rw)
m = max([g[x] for x in g.keys()])
if m == 1:
return 0
return m
Then use:
df['new'] = df.apply(lambda row: computeDiffs(row), axis=1)