Home > database >  Pandas: Determine if columns are matched
Pandas: Determine if columns are matched

Time:11-09

I'm trying to eliminate all rows that match in col0 and col1, but don't have a pair of -1, 1 between rows (for example in the dataframe below there isn't a a2, b1, -1 row). I was trying to come up with someway to do this, but was groupby and getting multiindex and not getting anywhere...

#   no a2, b1, -1
df = pd.DataFrame([
    ['a1', 'b1', -1, 0/1],
    ['a1', 'b1',  1, 1/1],
    ['a1', 'b2', -1, 2/1],
    ['a1', 'b2',  1, 1/2],
    ['a2', 'b1',  1, 1/3],
    ['a2', 'b2', -1, 2/3],
    ['a2', 'b2',  1, 4/1]
], columns=['col0', 'col1', 'col2', 'val'])

# desired output
# a1, b1, -1, 0.0
# a1, b1,  1, 1.0
# a1, b2, -1, 2.0
# a1, b2,  1, 0.5
# a2, b2, -1, 0.66667
# a2, b2,  1, 4.0

CodePudding user response:

We can use groupby filter to test if there are at least 1 (any) of each value (-1 and 1) per group with Series.any:

result_df = df.groupby(['col0', 'col1']).filter(
    lambda x: x['col2'].eq(-1).any() and x['col2'].eq(1).any()
)

result_df:

  col0 col1  col2       val
0   a1   b1    -1  0.000000
1   a1   b1     1  1.000000
2   a1   b2    -1  2.000000
3   a1   b2     1  0.500000
5   a2   b2    -1  0.666667
6   a2   b2     1  4.000000

CodePudding user response:

Not a perfect solution but you can use df['col2'].abs() to group rows:

>>> df[df.groupby(['col0', 'col1', df['col2'].abs()])['col2'] \
         .transform('count').eq(2)]

  col0 col1  col2       val
0   a1   b1    -1  0.000000
1   a1   b1     1  1.000000
2   a1   b2    -1  2.000000
3   a1   b2     1  0.500000
5   a2   b2    -1  0.666667
6   a2   b2     1  4.000000

Another solution (maybe better) using pivot:

>>> df.pivot(index=['col0', 'col1'], columns='col2', values='val') \
      .dropna(how='any').stack().rename('val').reset_index()

  col0 col1  col2       val
0   a1   b1    -1  0.000000
1   a1   b1     1  1.000000
2   a1   b2    -1  2.000000
3   a1   b2     1  0.500000
4   a2   b2    -1  0.666667
5   a2   b2     1  4.000000

CodePudding user response:

If it is always supposed to be exactly one (-1, 1) pair in each group we could just sum:

df.loc[df.groupby(['col0', 'col1'])['col2'].transform('sum') == 0]
  • Related