I have a dataframe like as shown below
ID,Region,Supplier,year,output
1,ANZ,AB,2021,1
2,ANZ,ABC,2022,1
3,ANZ,ABC,2022,1
4,ANZ,ABE,2021,0
5,ANZ,ABE,2021,1
6,ANZ,ABQ,2021,1
7,ANZ,ABW,2021,1
8,AUS,ABO,2020,1
9,KOREA,ABR,2019,0
I am trying to generate unique combination of region
and supplier
values. Instead of a groupby, I was thinking to do via zip_longest
.
So, I tried the below
for i,j in itertools.zip_longest(region_values,supplier_values,fillvalue="ANZ"):
print(i,j)
But the above results in incorrect entries for i and j
.
I want to get each unique combination from a specific row. I don't wish to multiply/generate new combinations which is not there in the data
Currently, this results in incorrect output as shown below
ANZ AB
AUS ABC #incorrect to generating new combinations like this
KOREA ABE #incorrect to generating new combinations like this
ANZ ABQ
ANZ ABW
ANZ ABO
ANZ ABR
I expect my output to be like as shown below
ANZ AB
ANZ ABC
ANZ ABE
ANZ ABQ
ANZ ABW
AUS ABO
KOREA ABR
I use zip_longest because after this, I want to use the output from zip object to filter dataframe using 2 columns
CodePudding user response:
If ordering is important need remove duplicates by both columns together, so instead unique
need drop_duplicates
:
column_name = "Region"
col_name = "Supplier"
df = data.drop_duplicates([column_name, col_name])
for i,j in zip(df[column_name],df[col_name]):
print(i,j)
ANZ AB
ANZ ABC
ANZ ABE
ANZ ABQ
ANZ ABW
AUS ABO
KOREA ABR
CodePudding user response:
It looks like you want a set:
set(zip(df['Region'], df['Supplier']))
output:
{('ANZ', 'AB'),
('ANZ', 'ABC'),
('ANZ', 'ABE'),
('ANZ', 'ABQ'),
('ANZ', 'ABW'),
('AUS', 'ABO'),
('KOREA', 'ABR')}
For iteration:
for r, s in set(zip(df['Region'], df['Supplier'])):
pass
if order is important use dict.from_keys
:
for a,b in dict.fromkeys(zip(df['Region'], df['Supplier'])).keys():
print(a,b)