I have this dataset:
from_to_weight = {'From':['Dimitrios Rafailidis','Dimitrios Rafailidis','Dimitrios Rafailidis','Dimitrios Rafailidis','Dimitrios Rafailidis','Fabio Crestani','Fabio Crestani','Fabio Crestani','Tomasz Kusmierczyk','Tomasz Kusmierczyk','Tomasz Kusmierczyk','Kjetil Nørvåg','Kjetil Nørvåg','Simranjit Singh Sachar','Simranjit Singh Sachar','Simranjit Singh Sachar','Nicholas Diakopoulos','Nicholas Diakopoulos','Nicholas Diakopoulos','Nicholas Diakopoulos','Nasser Alsaedi', 'Nasser Alsaedi','Pete Burnap', 'Pete Burnap','Omer F. Rana','Omer F. Rana','Omer F. Rana','Omer F. Rana','Omer F. Rana'],
'To':['Fabio Crestani', 'Tomasz Kusmierczyk', 'Simranjit Singh Sachar', 'Nicholas Diakopoulos', 'Omer F. Rana', 'Dimitrios Rafailidis', 'Nicholas Diakopoulos', 'Omer F. Rana','Dimitrios Rafailidis','Kjetil Nørvåg', 'Simranjit Singh Sachar','Tomasz Kusmierczyk', 'Kjetil Nørvåg','Dimitrios Rafailidis','Tomasz Kusmierczyk','Nicholas Diakopoulos','Dimitrios Rafailidis','Fabio Crestani', 'Simranjit Singh Sachar', 'Omer F. Rana','Pete Burnap', 'Omer F. Rana','Nasser Alsaedi', 'Omer F. Rana','Dimitrios Rafailidis', 'Fabio Crestani', 'Nicholas Diakopoulos', 'Nasser Alsaedi', 'Pete Burnap'],
'weight':[2,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]}
from_to_weight_df = pd.DataFrame(data=from_to_weight)
I want to exclude the rows in which from to = to from. To make it more clear, I want to have this output:
from_to_weight = {'From':['Dimitrios Rafailidis','Dimitrios Rafailidis','Dimitrios Rafailidis','Dimitrios Rafailidis','Dimitrios Rafailidis','Fabio Crestani','Fabio Crestani','Tomasz Kusmierczyk','Tomasz Kusmierczyk','Kjetil Nørvåg','Simranjit Singh Sachar','Nicholas Diakopoulos','Nasser Alsaedi','Nasser Alsaedi','Pete Burnap'],'To':['Fabio Crestani','Tomasz Kusmierczyk','Simranjit Singh Sachar','Nicholas Diakopoulos','Omer F. Rana','Nicholas Diakopoulos','Omer F. Rana','Kjetil Nørvåg','Simranjit Singh Sachar','Kjetil Nørvåg','Nicholas Diakopoulos','Omer F. Rana','Pete Burnap','Omer F. Rana','Omer F. Rana'],'weight':[2,1,1,1,1,1,1,1,1,1,1,1,1,1,1]}
from_to_weight_df = pd.DataFrame(data=from_to_weight)
from_to_weight_df
Any idea?
CodePudding user response:
A naive approach would be to iterate over every "from" "to" pair and check if the next pair's "to" and "from" are respectively equal to an existing "from" "to" pair:
seenPairs = set()
indexes = []
for i in range(len(df)):
_from = df["From"][i] # using underline since "from" is a keyword
_to = df["To"][i]
if (_from , _to) not in seenPairs:
seenPairs.add((_to, _from))
indexes.append(True)
else:
indexes.append(False)
# Getting the desired rows
from_to_weight = df[indexes]
There's probably a way to do this in a single line. But until someone shows up with such a solution, you may have mine!
CodePudding user response:
One way using frozenset
with duplicated
:
# sample data
tmp = pd.DataFrame([["a", "b"], ["b", "a"], ["c", "c"], ["a", "b"]])
new_df = tmp[~tmp.apply(frozenset, axis=1).duplicated()]
print(new_df)
Output:
0 1
0 a b
2 c c
CodePudding user response:
Here's a way:
pairs = set()
from_to_weight_df = from_to_weight_df.reindex(
from_to_weight_df.apply(
lambda x: None if (x.From, x.To) in pairs else (pairs.add((x.To, x.From)), 0)[1], axis=1
).dropna().index).reset_index(drop=True)
Input:
From To weight
0 Dimitrios Rafailidis Fabio Crestani 2
1 Dimitrios Rafailidis Tomasz Kusmierczyk 1
2 Dimitrios Rafailidis Simranjit Singh Sachar 1
3 Dimitrios Rafailidis Nicholas Diakopoulos 1
4 Dimitrios Rafailidis Omer F. Rana 1
5 Fabio Crestani Dimitrios Rafailidis 2
6 Fabio Crestani Nicholas Diakopoulos 1
7 Fabio Crestani Omer F. Rana 1
8 Tomasz Kusmierczyk Dimitrios Rafailidis 1
9 Tomasz Kusmierczyk Kjetil Nørvåg 1
10 Tomasz Kusmierczyk Simranjit Singh Sachar 1
11 Kjetil Nørvåg Tomasz Kusmierczyk 1
12 Kjetil Nørvåg Kjetil Nørvåg 1
13 Simranjit Singh Sachar Dimitrios Rafailidis 1
14 Simranjit Singh Sachar Tomasz Kusmierczyk 1
15 Simranjit Singh Sachar Nicholas Diakopoulos 1
16 Nicholas Diakopoulos Dimitrios Rafailidis 1
17 Nicholas Diakopoulos Fabio Crestani 1
18 Nicholas Diakopoulos Simranjit Singh Sachar 1
19 Nicholas Diakopoulos Omer F. Rana 1
20 Nasser Alsaedi Pete Burnap 1
21 Nasser Alsaedi Omer F. Rana 1
22 Pete Burnap Nasser Alsaedi 1
23 Pete Burnap Omer F. Rana 1
24 Omer F. Rana Dimitrios Rafailidis 1
25 Omer F. Rana Fabio Crestani 1
26 Omer F. Rana Nicholas Diakopoulos 1
27 Omer F. Rana Nasser Alsaedi 1
28 Omer F. Rana Pete Burnap 1
Output:
From To weight
0 Dimitrios Rafailidis Fabio Crestani 2
1 Dimitrios Rafailidis Tomasz Kusmierczyk 1
2 Dimitrios Rafailidis Simranjit Singh Sachar 1
3 Dimitrios Rafailidis Nicholas Diakopoulos 1
4 Dimitrios Rafailidis Omer F. Rana 1
5 Fabio Crestani Nicholas Diakopoulos 1
6 Fabio Crestani Omer F. Rana 1
7 Tomasz Kusmierczyk Kjetil Nørvåg 1
8 Tomasz Kusmierczyk Simranjit Singh Sachar 1
9 Kjetil Nørvåg Kjetil Nørvåg 1
10 Simranjit Singh Sachar Nicholas Diakopoulos 1
11 Nicholas Diakopoulos Omer F. Rana 1
12 Nasser Alsaedi Pete Burnap 1
13 Nasser Alsaedi Omer F. Rana 1
14 Pete Burnap Omer F. Rana 1