I have a pandas
dataframe that looks like this:
import pandas as pd
foo = pd.DataFrame({'id_p': [1,1,2,2,3,3,3,4,4],
'id_d_b': [True, True, False, True, True, True,False,False,False],
'id_d_i': [False, False, True, False,False,False,True,True,True]})
foo
id_p id_d_b id_d_i
0 1 True False
1 1 True False
2 2 False True
3 2 True False
4 3 True False
5 3 True False
6 3 False True
7 4 False True
8 4 False True
I would like to select the id_p
s which has have at least one True
at id_d_b
and
at least one True
at id_d_i
I tried this
foo['id_d_b'] = foo['id_d_b'].astype(int)
foo['id_d_i'] = foo['id_d_i'].astype(int)
foo['has_id_d_b'] = foo.groupby('id_p')['id_d_b'].transform('max')
foo['has_id_d_i'] = foo.groupby('id_p')['id_d_i'].transform('max')
foo['result'] = foo['has_id_d_b'] foo['has_id_d_i'] # if this is >1 then that specific id_p has at least one id_d_b and id_d_i
foo['result'] = foo.eval('result > 1')
foo
id_p id_d_b id_d_i has_id_d_b has_id_d_i result
0 1 1 0 1 0 False
1 1 1 0 1 0 False
2 2 0 1 1 1 True
3 2 1 0 1 1 True
4 3 1 0 1 1 True
5 3 1 0 1 1 True
6 3 0 1 1 1 True
7 4 0 1 0 1 False
8 4 0 1 0 1 False
which gives the correct result, but I am looking for a "one-liner" solution
CodePudding user response:
You can use: groupby.any
to check if any value per group is True, (eventually on a subset of columns), then ensure all columns are True
with all
. With isin
you can get the relevant indices (for slicing or indexing if needed).
m = foo.groupby('id_p').any().all(1)
# if other columns
# m = foo.groupby('id_p')[['id_d_b', 'id_d_i']].any().all(1)
foo['result'] = foo['id_p'].isin(m[m].index)
For the indices:
print(m[m].index)
# Int64Index([2, 3], dtype='int64', name='id_p')
or, using groupby.transform
, which will be more expensive:
foo['result'] = foo.groupby('id_p').transform('any').all(1)
# if other columns
# foo['result'] = this foo.groupby('id_p')[['id_d_b', 'id_d_i']].transform('any').all(1)
output:
id_p id_d_b id_d_i result
0 1 True False False
1 1 True False False
2 2 False True True
3 2 True False True
4 3 True False True
5 3 True False True
6 3 False True True
7 4 False True False
8 4 False True False
CodePudding user response:
Use Series.map
with Series
created by aggregate GroupBy.any
with DataFrame.all
:
foo['result'] = foo['id_p'].map(foo.groupby('id_p')[['id_d_b', 'id_d_i']].any().all(axis=1))
print (foo)
id_p id_d_b id_d_i result
0 1 True False False
1 1 True False False
2 2 False True True
3 2 True False True
4 3 True False True
5 3 True False True
6 3 False True True
7 4 False True False
8 4 False True False