I have a df as follows:
data = {'retailer': [2, 2, 2, 2, 2, 5, 5, 5, 5, 5],
'store': [1, 1, 1, 1, 1, 7, 7, 7, 7, 7],
'week':[2021110701, 2021101301, 2021100601, 2021092901, 2021092201, 2021110701, 2021101301, 2021100601, 2021092901, 2021092201],
'isPeriod': [False, True, False, False, False, False, False, True, False, False],
'quadId': [2021112804, 2021103104, 2021103104, 2021103104, 2021103104, 2021100304, 2021100304, 2021103104, 2021103104, 2021103104]
}
data = pd.DataFrame(data)
I would like to located where 'isPeriod' == True, get the corresponding 'quadId' values for where 'isPeriod' is True and then filter my entire dataframe to only have that corresponding 'quadId'.
For example, in my df we can see that in the second row, 'isPeriod' is True and the corresponding 'quadId' is 2021103104. So I would like my filtered df to only contain the rows where 'quadId' is 2021103104.
In this case my example filtered df would look like:
data = {'retailer': [2, 2, 2, 2, 5, 5, 5],
'store': [1, 1, 1, 1, 7, 7, 7],
'week':[2021101301, 2021100601, 2021092901, 2021092201, 2021100601, 2021092901, 2021092201],
'isPeriod': [True, False, False, False, True, False, False],
'quadId': [2021103104, 2021103104, 2021103104, 2021103104, 2021103104, 2021103104, 2021103104]
}
data = pd.DataFrame(data)
Is there a way I can do this? Thanks! (Also wherever if there are multiple True values for isPeriod, the quadId's will all be the same for them)
CodePudding user response:
Use isin
to check for existence, then loc
:
valid_quarters = data.loc[data.isPeriod, 'quadId']
data[data['quadId'].isin(valid_quarters)]
Output:
retailer store week isPeriod quadId
1 2 1 2021101301 True 2021103104
2 2 1 2021100601 False 2021103104
3 2 1 2021092901 False 2021103104
4 2 1 2021092201 False 2021103104
7 5 7 2021100601 True 2021103104
8 5 7 2021092901 False 2021103104
9 5 7 2021092201 False 2021103104