Pandas: get unique rows with unique value in column grouped by another column value-CodePudding

Let's say I hace a datframe like this:

df = pd.DataFrame({
'COUNTRY_CODE': ['CO','CO','CO','BR','BR','BR'], 
'VERTICAL_GROUP_ID': [2,2,3,2,3,3], 
'SUB_VERTICAL': ['SUPER','SUPER','HOME','LICOR','SPORTS','HOME'], 
'PRODUCT_ID': [1111,1111,1111,1111,1111,2222], 
'SHOWN': [7,8,12,14,16,1], 
})

I want to get in another dataframe, for each COUNTRY_CODE/PRODUCT_ID combination, only ONE row for each VERTICAL_GROUP.

So for the df above, I'd like to get something like this:

COUNTRY_CODE	VERTICAL_GROUP_ID	SUB_VERTICAL	PRODUCT_ID	SHOWN
CO	2	SUPER	1111	7
CO	3	HOME	1111	12
BR	2	LICOR	1111	14
BR	3	SPORTS	1111	16
BR	3	HOME	2222	1

It doesn't matter which rows of each COUNTRY_CODE/PRODUCT_ID/VERTICAL_GROUP combination I keep, as long as I get only 1 for each VERTICAL_GROUP.

Whats the best way of doing this? I tried mixing a grouby("PRODUCT_ID") with a drop_duplicates(subset=['VERTICAL_GROUP_ID']), but I keepe doing something wrong,because I keep getting errors like

AttributeError: 'DataFrameGroupBy' object has no attribute 'drop_duplicates'

CodePudding user response：

You could try:

df.groupby(['COUNTRY_CODE', 'VERTICAL_GROUP_ID', 'PRODUCT_ID']).agg('first').reset_index()

  COUNTRY_CODE  VERTICAL_GROUP_ID  PRODUCT_ID SUB_VERTICAL  SHOWN
0           BR                  2        1111        LICOR     14
1           BR                  3        1111       SPORTS     16
2           BR                  3        2222         HOME      1
3           CO                  2        1111        SUPER      7
4           CO                  3        1111         HOME     12