Home > Software design >  Comparing columns with at least some values are the same Python
Comparing columns with at least some values are the same Python

Time:11-10

I have a dataframe with several rows and I need to assign a number (new column) according to the values of the other columns:

  • If all values in the different columns are the same the new value would be 5,
  • if at least 4 values are the same it would be 4,
  • if at least 3 values are the same, 3
  • and so on until all are different values and the new value would be 0.
  p1 p2 p3 p4 p5
0 1  1  1  1  1
1 2  3  2  2  2
2 2  2  3  4  1
3 4  4  1  1  1
4 2  1  2  3  5 
5 1  2  3  4  5

Output

 p1 p2 p3 p4 p5 new
0 1  1  1  1  1  5
1 2  3  2  2  2  4
2 2  2  3  4  1  2
3 4  4  1  1  1  3
4 2  1  2  3  5  2
5 1  2  3  4  5  0

For the first example where all are the same, I used np.where and it works:

df['new'] = np.where((df['p1'] == df['p2']) & (df['p1'] == df['p3']) & (df['p1'] == df['p4']) & (df['p1'] == df['p5']), 5, 0)

CodePudding user response:

You can make use of collections.Counter to count how many items are equal. I come up with this way:

import numpy as np
import pandas as pd
from collections import Counter

data = [
    [1,1,1,1,1],
    [2,3,2,2,2],
    [2,2,3,4,1],
    [4,4,1,1,1],
    [2,1,2,3,5],
    [1,2,3,4,5]
]
df = pd.DataFrame(data, columns=['p1','p2','p3','p4','p5'])
print(df)

new_col = []
for i in range(len(df)):
    a = dict(Counter(df.iloc[i].tolist()))
    m = a[max(a, key=a.get)]
    if m > 1:
        new_col.append(a[max(a, key=a.get)])
    else:
        new_col.append(0)


df['new'] = new_col

print(df)

Output:

   p1  p2  p3  p4  p5  new
0   1   1   1   1   1    5
1   2   3   2   2   2    4
2   2   2   3   4   1    2
3   4   4   1   1   1    3
4   2   1   2   3   5    2
5   1   2   3   4   5    0

CodePudding user response:

A slightly different approach from that suggested by @DeusDev would be:

def computeDiffs(rw):
    g = Counter(rw)
    m = max([g[x] for x in g.keys()])
    if m == 1:
        return 0
    return m
  

Then use:

df['new'] = df.apply(lambda row: computeDiffs(row), axis=1)
  
  • Related