Python Pandas how to find matching values by label-CodePudding

I have a csv file that looks something like this:

mark	time	value1	value2
1	14:22:02	5	2
1	14:22:05	8	4
2	14:25:02	1	1
2	14:26:05	4	7
3	15:12:08	5	2
3	15:12:11	5	4
3	15:12:15	5	2
3	15:12:17	8	4

I would like to output all the matches by label 1 and 3

Expected result:

Number of matches is the number of intersections with the same symbols of the label 1 and 3 That is, if there are 5 in mark 1 and Value 1 column, then it counts the entire number of intersections with mark3 in Value 1

By two columns of value

mark	value1	value2	Number of matches
1-3	5	2	2
1-3	8	4	1

For value 1

mark	value1	Number of matches
1-3	5	3
1-3	8	1

For value 2

mark	value2	Number of matches
1-3	2	2
1-3	4	2

CodePudding user response：

You can use a groupby on the filtered DataFrame, then filter again to have a count > 1:

target = ['value1', 'value2']

(df.loc[df['mark'].isin([1,3])]
   .astype({'mark': 'str'})
   .groupby(target, as_index=False)
   .agg(**{'mark': ('mark', lambda g: '-'.join(dict.fromkeys(g))),
           'Num matches': ('mark', 'count')
          })
   .loc[lambda d: d['Num matches'].gt(1)]
 )

Output:

   value1  value2 mark  Num matches
0       5       2  1-3            3
2       8       4  1-3            2