Home > database >  Keep unique pairs from 2 columns
Keep unique pairs from 2 columns

Time:06-24

I have this dataset:

from_to_weight = {'From':['Dimitrios Rafailidis','Dimitrios Rafailidis','Dimitrios Rafailidis','Dimitrios Rafailidis','Dimitrios Rafailidis','Fabio Crestani','Fabio Crestani','Fabio Crestani','Tomasz Kusmierczyk','Tomasz Kusmierczyk','Tomasz Kusmierczyk','Kjetil Nørvåg','Kjetil Nørvåg','Simranjit Singh Sachar','Simranjit Singh Sachar','Simranjit Singh Sachar','Nicholas Diakopoulos','Nicholas Diakopoulos','Nicholas Diakopoulos','Nicholas Diakopoulos','Nasser Alsaedi', 'Nasser Alsaedi','Pete Burnap', 'Pete Burnap','Omer F. Rana','Omer F. Rana','Omer F. Rana','Omer F. Rana','Omer F. Rana'],
'To':['Fabio Crestani', 'Tomasz Kusmierczyk', 'Simranjit Singh Sachar', 'Nicholas Diakopoulos', 'Omer F. Rana', 'Dimitrios Rafailidis', 'Nicholas Diakopoulos', 'Omer F. Rana','Dimitrios Rafailidis','Kjetil Nørvåg', 'Simranjit Singh Sachar','Tomasz Kusmierczyk', 'Kjetil Nørvåg','Dimitrios Rafailidis','Tomasz Kusmierczyk','Nicholas Diakopoulos','Dimitrios Rafailidis','Fabio Crestani', 'Simranjit Singh Sachar', 'Omer F. Rana','Pete Burnap', 'Omer F. Rana','Nasser Alsaedi', 'Omer F. Rana','Dimitrios Rafailidis', 'Fabio Crestani', 'Nicholas Diakopoulos', 'Nasser Alsaedi', 'Pete Burnap'],
'weight':[2,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]}

from_to_weight_df = pd.DataFrame(data=from_to_weight) 

enter image description here

I want to exclude the rows in which from to = to from. To make it more clear, I want to have this output:

from_to_weight = {'From':['Dimitrios Rafailidis','Dimitrios Rafailidis','Dimitrios Rafailidis','Dimitrios Rafailidis','Dimitrios Rafailidis','Fabio Crestani','Fabio Crestani','Tomasz Kusmierczyk','Tomasz Kusmierczyk','Kjetil Nørvåg','Simranjit Singh Sachar','Nicholas Diakopoulos','Nasser Alsaedi','Nasser Alsaedi','Pete Burnap'],'To':['Fabio Crestani','Tomasz Kusmierczyk','Simranjit Singh Sachar','Nicholas Diakopoulos','Omer F. Rana','Nicholas Diakopoulos','Omer F. Rana','Kjetil Nørvåg','Simranjit Singh Sachar','Kjetil Nørvåg','Nicholas Diakopoulos','Omer F. Rana','Pete Burnap','Omer F. Rana','Omer F. Rana'],'weight':[2,1,1,1,1,1,1,1,1,1,1,1,1,1,1]}

from_to_weight_df = pd.DataFrame(data=from_to_weight)
from_to_weight_df

enter image description here

Any idea?

CodePudding user response:

A naive approach would be to iterate over every "from" "to" pair and check if the next pair's "to" and "from" are respectively equal to an existing "from" "to" pair:

seenPairs = set()
indexes = []

for i in range(len(df)):
  _from = df["From"][i] # using underline since "from" is a keyword
  _to = df["To"][i]

  if (_from , _to) not in seenPairs:
    seenPairs.add((_to, _from))
    indexes.append(True)
  else:
    indexes.append(False)

# Getting the desired rows
from_to_weight = df[indexes]
    

There's probably a way to do this in a single line. But until someone shows up with such a solution, you may have mine!

CodePudding user response:

One way using frozenset with duplicated:

# sample data
tmp = pd.DataFrame([["a", "b"], ["b", "a"], ["c", "c"], ["a", "b"]])

new_df = tmp[~tmp.apply(frozenset, axis=1).duplicated()]
print(new_df)

Output:

   0  1
0  a  b
2  c  c

CodePudding user response:

Here's a way:

pairs = set()
from_to_weight_df = from_to_weight_df.reindex(
    from_to_weight_df.apply(
    lambda x: None if (x.From, x.To) in pairs else (pairs.add((x.To, x.From)), 0)[1], axis=1
    ).dropna().index).reset_index(drop=True)

Input:

                      From                      To  weight
0     Dimitrios Rafailidis          Fabio Crestani       2
1     Dimitrios Rafailidis      Tomasz Kusmierczyk       1
2     Dimitrios Rafailidis  Simranjit Singh Sachar       1
3     Dimitrios Rafailidis    Nicholas Diakopoulos       1
4     Dimitrios Rafailidis            Omer F. Rana       1
5           Fabio Crestani    Dimitrios Rafailidis       2
6           Fabio Crestani    Nicholas Diakopoulos       1
7           Fabio Crestani            Omer F. Rana       1
8       Tomasz Kusmierczyk    Dimitrios Rafailidis       1
9       Tomasz Kusmierczyk           Kjetil Nørvåg       1
10      Tomasz Kusmierczyk  Simranjit Singh Sachar       1
11           Kjetil Nørvåg      Tomasz Kusmierczyk       1
12           Kjetil Nørvåg           Kjetil Nørvåg       1
13  Simranjit Singh Sachar    Dimitrios Rafailidis       1
14  Simranjit Singh Sachar      Tomasz Kusmierczyk       1
15  Simranjit Singh Sachar    Nicholas Diakopoulos       1
16    Nicholas Diakopoulos    Dimitrios Rafailidis       1
17    Nicholas Diakopoulos          Fabio Crestani       1
18    Nicholas Diakopoulos  Simranjit Singh Sachar       1
19    Nicholas Diakopoulos            Omer F. Rana       1
20          Nasser Alsaedi             Pete Burnap       1
21          Nasser Alsaedi            Omer F. Rana       1
22             Pete Burnap          Nasser Alsaedi       1
23             Pete Burnap            Omer F. Rana       1
24            Omer F. Rana    Dimitrios Rafailidis       1
25            Omer F. Rana          Fabio Crestani       1
26            Omer F. Rana    Nicholas Diakopoulos       1
27            Omer F. Rana          Nasser Alsaedi       1
28            Omer F. Rana             Pete Burnap       1

Output:

                      From                      To  weight
0     Dimitrios Rafailidis          Fabio Crestani       2
1     Dimitrios Rafailidis      Tomasz Kusmierczyk       1
2     Dimitrios Rafailidis  Simranjit Singh Sachar       1
3     Dimitrios Rafailidis    Nicholas Diakopoulos       1
4     Dimitrios Rafailidis            Omer F. Rana       1
5           Fabio Crestani    Nicholas Diakopoulos       1
6           Fabio Crestani            Omer F. Rana       1
7       Tomasz Kusmierczyk           Kjetil Nørvåg       1
8       Tomasz Kusmierczyk  Simranjit Singh Sachar       1
9            Kjetil Nørvåg           Kjetil Nørvåg       1
10  Simranjit Singh Sachar    Nicholas Diakopoulos       1
11    Nicholas Diakopoulos            Omer F. Rana       1
12          Nasser Alsaedi             Pete Burnap       1
13          Nasser Alsaedi            Omer F. Rana       1
14             Pete Burnap            Omer F. Rana       1
  • Related