Is it possible to compare values between N columns, row by row, on the same dataframe and set a new column counteing the repetitions?
From:
id | column1 | column2 | column3
0 | z | x | x
1 | y | y | y
2 | x | x | x
3 | x | x | x
4 | z | y | x
5 | w | w | w
6 | w | w | w
7 | w | w | w
To:
id | column1 | column2 | column3 | counter
0 | z | x | x | 0
1 | y | y | y | 1
2 | x | x | x | 2
3 | x | x | x | 2
4 | z | y | x | 0
5 | w | w | w | 3
6 | w | w | w | 3
7 | w | w | w | 3
Something like that: if(column1[someRow] == column1[anotherRow] & column2[someRow] == column2[anotherRow] & column3[someRow] == column3[anotherRow])
then counter[someRow]
CodePudding user response:
You can use:
# keep only relevant columns
df2 = df.drop(columns='id')
# are the values identical in the row?
m = df2.eq(df2.iloc[:, 0], axis=0).all(axis=1)
# count the number of occurrences per group
# and only keep the output when all values are identical
df['counter'] = df2.groupby(list(df2)).transform('size').where(m, 0)
# for older pandas versions
# df['counter'] = df2.groupby(list(df2))[df2.columns[0]].transform('size').where(m, 0)
Output (with an extra row for clarity):
id column1 column2 column3 counter
0 0 z x x 0
1 1 y y y 1
2 2 x x x 2
3 3 x x x 2
4 4 z y x 0
5 5 w w w 1
Used input:
df = pd.DataFrame({'id': [0, 1, 2, 3, 4, 5],
'column1': ['z', 'y', 'x', 'x', 'z', 'w'],
'column2': ['x', 'y', 'x', 'x', 'y', 'w'],
'column3': ['x', 'y', 'x', 'x', 'x', 'w'],
})
CodePudding user response:
You can do:
s = df.drop("id", axis=1).nunique(1)
df["counter"] = (
df.groupby(df.where(s.eq(1))["column1"]).transform("size").fillna(0).astype(int)
)
#For previous versions of pandas
df["counter"] = (
df.groupby(df.where(s.eq(1))["column1"])["column1"]
.transform("size")
.fillna(0)
.astype(int)
)
print(df)
id column1 column2 column3 counter
0 0 z x x 0
1 1 y y y 1
2 2 x x x 2
3 3 x x x 2
4 4 z y x 0
5 5 w w w 1
What were are doing here is get number of unique elements across axis 1 (excluding column id
) using nunique
and then take those rows which have only one unique value and do a groupby.transform
with size
.