Home > Net >  How to use itertools for getting unique row items using pandas?
How to use itertools for getting unique row items using pandas?

Time:04-01

I have a dataframe like as shown below

ID,Region,Supplier,year,output
1,ANZ,AB,2021,1
2,ANZ,ABC,2022,1
3,ANZ,ABC,2022,1
4,ANZ,ABE,2021,0
5,ANZ,ABE,2021,1
6,ANZ,ABQ,2021,1
7,ANZ,ABW,2021,1
8,AUS,ABO,2020,1
9,KOREA,ABR,2019,0

I am trying to generate unique combination of region and supplier values. Instead of a groupby, I was thinking to do via zip_longest.

So, I tried the below

for i,j in itertools.zip_longest(region_values,supplier_values,fillvalue="ANZ"):
    print(i,j)

But the above results in incorrect entries for i and j.

I want to get each unique combination from a specific row. I don't wish to multiply/generate new combinations which is not there in the data

Currently, this results in incorrect output as shown below

ANZ AB
AUS ABC  #incorrect to generating new combinations like this
KOREA ABE #incorrect to generating new combinations like this
ANZ ABQ
ANZ ABW
ANZ ABO
ANZ ABR

I expect my output to be like as shown below

ANZ AB
ANZ ABC
ANZ ABE
ANZ ABQ
ANZ ABW
AUS ABO
KOREA ABR 

I use zip_longest because after this, I want to use the output from zip object to filter dataframe using 2 columns

CodePudding user response:

If ordering is important need remove duplicates by both columns together, so instead unique need drop_duplicates:

column_name = "Region"
col_name = "Supplier"
df = data.drop_duplicates([column_name, col_name])    
        
for i,j in zip(df[column_name],df[col_name]):
    print(i,j)
    ANZ AB
    ANZ ABC
    ANZ ABE
    ANZ ABQ
    ANZ ABW
    AUS ABO
    KOREA ABR

CodePudding user response:

It looks like you want a set:

set(zip(df['Region'], df['Supplier']))

output:

{('ANZ', 'AB'),
 ('ANZ', 'ABC'),
 ('ANZ', 'ABE'),
 ('ANZ', 'ABQ'),
 ('ANZ', 'ABW'),
 ('AUS', 'ABO'),
 ('KOREA', 'ABR')}

For iteration:

for r, s in set(zip(df['Region'], df['Supplier'])):
    pass

if order is important use dict.from_keys:

for a,b in dict.fromkeys(zip(df['Region'], df['Supplier'])).keys():
    print(a,b)
  • Related