Home > Net >  Counting repetitions and writing on a new column
Counting repetitions and writing on a new column

Time:01-05

Is it possible to compare values between N columns, row by row, on the same dataframe and set a new column counteing the repetitions?
From:

id | column1 | column2 | column3
0  | z       | x       | x       
1  | y       | y       | y       
2  | x       | x       | x       
3  | x       | x       | x       
4  | z       | y       | x      
5  | w       | w       | w     
6  | w       | w       | w     
7  | w       | w       | w      

To:

id | column1 | column2 | column3 | counter
0  | z       | x       | x       | 0
1  | y       | y       | y       | 1
2  | x       | x       | x       | 2
3  | x       | x       | x       | 2
4  | z       | y       | x       | 0
5  | w       | w       | w       | 3
6  | w       | w       | w       | 3
7  | w       | w       | w       | 3

Something like that: if(column1[someRow] == column1[anotherRow] & column2[someRow] == column2[anotherRow] & column3[someRow] == column3[anotherRow]) then counter[someRow]

CodePudding user response:

You can use:

# keep only relevant columns
df2 = df.drop(columns='id')

# are the values identical in the row?
m = df2.eq(df2.iloc[:, 0], axis=0).all(axis=1)

# count the number of occurrences per group
# and only keep the output when all values are identical
df['counter'] = df2.groupby(list(df2)).transform('size').where(m, 0)

# for older pandas versions
# df['counter'] = df2.groupby(list(df2))[df2.columns[0]].transform('size').where(m, 0)

Output (with an extra row for clarity):

   id column1 column2 column3  counter
0   0       z       x       x        0
1   1       y       y       y        1
2   2       x       x       x        2
3   3       x       x       x        2
4   4       z       y       x        0
5   5       w       w       w        1

Used input:

df = pd.DataFrame({'id': [0, 1, 2, 3, 4, 5],
                   'column1': ['z', 'y', 'x', 'x', 'z', 'w'],
                   'column2': ['x', 'y', 'x', 'x', 'y', 'w'],
                   'column3': ['x', 'y', 'x', 'x', 'x', 'w'],
                  })

CodePudding user response:

You can do:

s = df.drop("id", axis=1).nunique(1)
df["counter"] = (
    df.groupby(df.where(s.eq(1))["column1"]).transform("size").fillna(0).astype(int)
)

#For previous versions of pandas
df["counter"] = (
    df.groupby(df.where(s.eq(1))["column1"])["column1"]
    .transform("size")
    .fillna(0)
    .astype(int)
)

print(df)

   id column1 column2 column3  counter
0   0       z       x       x        0
1   1       y       y       y        1
2   2       x       x       x        2
3   3       x       x       x        2
4   4       z       y       x        0
5   5       w       w       w        1

What were are doing here is get number of unique elements across axis 1 (excluding column id) using nunique and then take those rows which have only one unique value and do a groupby.transform with size.

  • Related