I have a list of tuples like this one (PRODUCT_ID, COUNTRY_CODE):
[(1111, 'CO'),
(2222, 'CO'),
(1111, 'BR')]
and a dataframe like this one:
df = pd.DataFrame({
'COUNTRY_CODE': ['CO','CO','CO','BR','BR','BR','CO'],
'VERTICAL_GROUP_ID': [2,2,3,2,3,3,3],
'SUB_VERTICAL': ['SUPER','SUPER','HOME','LICOR','SPORTS','HOME','TECH'],
'PRODUCT_ID': [1111,3333,1111,4444,1111,2222,2222],
'SHOWN': [7,8,12,14,16,1,13],
})
How can I filter the dataframe so that I get a resulting dataframe like this, filtered with only the values from the list of tuples by PRODUCT_ID and COUNTRY_CODE?
COUNTRY_CODE | VERTICAL_GROUP_ID | SUB_VERTICAL | PRODUCT_ID | SHOWN |
---|---|---|---|---|
CO | 2 | SUPER | 1111 | 7 |
CO | 3 | HOME | 1111 | 12 |
BR | 3 | SPORTS | 1111 | 16 |
CO | 3 | TECH | 2222 | 13 |
CodePudding user response:
Lets use Multiindex.isin
df[df.set_index(['PRODUCT_ID', 'COUNTRY_CODE']).index.isin(tup)]
Alternatively you can create a new dataframe from tuples then inner merge with the given dataframe
filters = pd.DataFrame(tup, columns=['PRODUCT_ID', 'COUNTRY_CODE'])
df.merge(filters)
COUNTRY_CODE VERTICAL_GROUP_ID SUB_VERTICAL PRODUCT_ID SHOWN
0 CO 2 SUPER 1111 7
1 CO 3 HOME 1111 12
2 BR 3 SPORTS 1111 16
3 CO 3 TECH 2222 13
CodePudding user response:
You can zip
two columns : PRODUCT_ID
and COUNTRY_CODE
and check whether the values are in list
of tuple
s or not and return desired df
.
lst = [(1111, 'CO'),(2222, 'CO'),(1111, 'BR')]
m = [tpl in lst for tpl in zip(df['PRODUCT_ID'], df['COUNTRY_CODE'])]
# m -> [True, False, True, False, True, False, True]
df_new = df[m]
print(df_new)
Output:
COUNTRY_CODE VERTICAL_GROUP_ID SUB_VERTICAL PRODUCT_ID SHOWN
0 CO 2 SUPER 1111 7
2 CO 3 HOME 1111 12
4 BR 3 SPORTS 1111 16
6 CO 3 TECH 2222 13
Explanation:
>>> tuple(zip(df['PRODUCT_ID'], df['COUNTRY_CODE']))
((1111, 'CO'), # True : in [(1111, 'CO'),(2222, 'CO'),(1111, 'BR')]
(3333, 'CO'), # False : Not in [(1111, 'CO'),(2222, 'CO'),(1111, 'BR')]
(1111, 'CO'),
(4444, 'BR'),
(1111, 'BR'),
(2222, 'BR'),
(2222, 'CO'))