Pandas - compare and count repetitions-CodePudding

Is it possible to compare values on the same dataframe and set a new column to count the repetitions with pandas?
From:

id | column1 | column2
0  | a       | a
1  | b       | b
2  | a       | c
3  | c       | c
4  | b       | b

To:

id | column1 | column2 | count
0  | a       | a       | 1
1  | b       | b       | 2
2  | a       | c       | 0
3  | c       | c       | 1
4  | b       | b       | 2

CodePudding user response：

You can use filter to get the columns of interest, identify the rows with all identical values and use it to create a custom group for a groupby.transform('size'):

# filter columns in "column"
df2 = df.filter(like='column')
# keep rows with identical columns
s = df2.iloc[:, 0].where(df2.eq(s, axis=0).all(axis=1))
# groupby unique value and get the count
df['count'] = df.groupby(s).transform('size').fillna(0, downcast='infer')

NB. this is working with an arbitrary number of columns

If you really have only two columns, you can simplify to:

m = df['column1'].eq(df['column2'])
df['count'] = (m.groupby(df['column1'].where(m))
                .transform('size').fillna(0, downcast='infer')
              )

Output:

   id column1 column2  count
0   0       a       a      1
1   1       b       b      2
2   2       a       c      0
3   3       c       c      1
4   4       b       b      2

CodePudding user response：

Compare both columns to helper m column and use GroupBy.transform with sum:

df['count'] = (df.assign(m=df.column1.eq(df.column2))
                 .groupby(['column1', 'column2'])['m']
                 .transform('sum'))
print (df)
   id column1 column2  count
0   0       a       a      1
1   1       b       b      2
2   2       a       c      0
3   3       c       c      1
4   4       b       b      2