I'm trying to eliminate all rows that match in col0
and col1
, but don't have a pair of -1, 1
between rows (for example in the dataframe below there isn't a a2, b1, -1
row). I was trying to come up with someway to do this, but was groupby
and getting multiindex and not getting anywhere...
# no a2, b1, -1
df = pd.DataFrame([
['a1', 'b1', -1, 0/1],
['a1', 'b1', 1, 1/1],
['a1', 'b2', -1, 2/1],
['a1', 'b2', 1, 1/2],
['a2', 'b1', 1, 1/3],
['a2', 'b2', -1, 2/3],
['a2', 'b2', 1, 4/1]
], columns=['col0', 'col1', 'col2', 'val'])
# desired output
# a1, b1, -1, 0.0
# a1, b1, 1, 1.0
# a1, b2, -1, 2.0
# a1, b2, 1, 0.5
# a2, b2, -1, 0.66667
# a2, b2, 1, 4.0
CodePudding user response:
We can use groupby filter
to test if there are at least 1 (any
) of each value (-1 and 1) per group with Series.any
:
result_df = df.groupby(['col0', 'col1']).filter(
lambda x: x['col2'].eq(-1).any() and x['col2'].eq(1).any()
)
result_df
:
col0 col1 col2 val
0 a1 b1 -1 0.000000
1 a1 b1 1 1.000000
2 a1 b2 -1 2.000000
3 a1 b2 1 0.500000
5 a2 b2 -1 0.666667
6 a2 b2 1 4.000000
CodePudding user response:
Not a perfect solution but you can use df['col2'].abs()
to group rows:
>>> df[df.groupby(['col0', 'col1', df['col2'].abs()])['col2'] \
.transform('count').eq(2)]
col0 col1 col2 val
0 a1 b1 -1 0.000000
1 a1 b1 1 1.000000
2 a1 b2 -1 2.000000
3 a1 b2 1 0.500000
5 a2 b2 -1 0.666667
6 a2 b2 1 4.000000
Another solution (maybe better) using pivot
:
>>> df.pivot(index=['col0', 'col1'], columns='col2', values='val') \
.dropna(how='any').stack().rename('val').reset_index()
col0 col1 col2 val
0 a1 b1 -1 0.000000
1 a1 b1 1 1.000000
2 a1 b2 -1 2.000000
3 a1 b2 1 0.500000
4 a2 b2 -1 0.666667
5 a2 b2 1 4.000000
CodePudding user response:
If it is always supposed to be exactly one (-1, 1) pair in each group we could just sum:
df.loc[df.groupby(['col0', 'col1'])['col2'].transform('sum') == 0]