Home > database >  Python Pandas how to find matching values by label
Python Pandas how to find matching values by label

Time:08-28

I have a csv file that looks something like this:

mark time value1 value2
1 14:22:02 5 2
1 14:22:05 8 4
2 14:25:02 1 1
2 14:26:05 4 7
3 15:12:08 5 2
3 15:12:11 5 4
3 15:12:15 5 2
3 15:12:17 8 4

I would like to output all the matches by label 1 and 3

Expected result:

Number of matches is the number of intersections with the same symbols of the label 1 and 3 That is, if there are 5 in mark 1 and Value 1 column, then it counts the entire number of intersections with mark3 in Value 1

By two columns of value

mark value1 value2 Number of matches
1-3 5 2 2
1-3 8 4 1

For value 1

mark value1 Number of matches
1-3 5 3
1-3 8 1

For value 2

mark value2 Number of matches
1-3 2 2
1-3 4 2

CodePudding user response:

You can use a groupby on the filtered DataFrame, then filter again to have a count > 1:

target = ['value1', 'value2']

(df.loc[df['mark'].isin([1,3])]
   .astype({'mark': 'str'})
   .groupby(target, as_index=False)
   .agg(**{'mark': ('mark', lambda g: '-'.join(dict.fromkeys(g))),
           'Num matches': ('mark', 'count')
          })
   .loc[lambda d: d['Num matches'].gt(1)]
 )

Output:

   value1  value2 mark  Num matches
0       5       2  1-3            3
2       8       4  1-3            2
  • Related